Application of Machine Learning Techniques for ... - Semantic Scholar

3 downloads 30783 Views 1MB Size Report
See http://www.ieee.org/publications standards/publications/rights/index.html for more information. ..... There are many other types of particle filters such as bootstrap filter and ..... PSD of the field employing Bayesian filtering and the reference.
JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 33, NO. 7, APRIL 1, 2015

1333

Application of Machine Learning Techniques for Amplitude and Phase Noise Characterization Darko Zibar, Member, IEEE, Luis Henrique Hecker de Carvalho, Molly Piels, Andy Doberstein, Julio Diniz, Bernd Nebendahl, Carolina Franciscangelis, Jose Estaran, Member, IEEE, Hansjoerg Haisch, Neil G. Gonzalez, Julio Cesar R. F. de Oliveira, and Idelfonso Tafur Monroy (Invited Paper)

Abstract—In this paper, tools from machine learning community, such as Bayesian filtering and expectation maximization parameter estimation, are presented and employed for laser amplitude and phase noise characterization. We show that phase noise estimation based on Bayesian filtering outperforms conventional time-domain approach in the presence of moderate measurement noise. Additionally, carrier synchronization based on Bayesian filtering, in combination with expectation maximization, is demonstrated for the first time experimentally. Index Terms—Bayesian filtering, expectation maximization, optical communication, phase noise, synchronization.

I. INTRODUCTION OMBINATION of advanced modulation formats and digital signal processing (DSP) assisted coherent detection has had a significant impact on the field of optical communication [1]. Especially a lot of research effort has been focusing on polarization multiplexed quadrature amplitude modulation (QAM) with 16 levels as it is a good compromise for achieving a relatively high spectral efficiency and transmission distance [2]. The next step, in increasing the spectral efficiency of coherent optical communication systems is to move towards 64-QAM and even beyond [3], [4]. One of the challenges associated with 64-QAM is the need for robust carrier frequency and phase synchronization. Related to this is the need to accurately characterize laser amplitude and phase noise, as a design tool for both lasers themselves as well as to optimize DSP carrier synchronization algorithms. For characterization of laser phase noise, coherent detection in combination with DSP has been

C

Manuscript received October 3, 2014; revised November 22, 2014 and January 15, 2015; accepted January 18, 2015. Date of publication January 20, 2015; date of current version March 4, 2015. This work was supported by the Danish Council for Independent Research under Project CORESON and Villum Foundation Young Investigator program. D. Zibar, M. Piels, J. Estaran, and I. T. Monroy are with DTU Fotonik, Department of Photonics Engineering, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). L. H. H. de Carvalho, J. Diniz, C. Franciscangelis, N. G. Gonzalez, and J. C. R. F. de Oliveira are with the Centro de Pesquisa e Desenvolvimento em Telecomunicac¸o˜ es, Campinas, SP 13086-902, Brazil (e-mail: [email protected]; [email protected]; [email protected]; neilg@cpqd. com.br; [email protected]). A. Doberstein, B. Nebendahl, and H. Haisch are with Keysight Technologies, 71034 Boeblingen, Germany (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JLT.2015.2394808

proposed and demonstrated [5], [6]. The limitation of the methods proposed in references [5] and [6] is that in the presence of moderate measurement noise, the phase noise estimation becomes inaccurate. Therefore, to obtain the highest statistical accuracy methods based on maximum likelihood (ML) or maximum a posteriori (MAP) need to be employed. The field of machine learning offers powerful statistical signal processing tools that can be applied for accurate amplitude and phase noise characterization. The significant advantage of the machine learning tools is that physics of the channel or device can be included into the characterization and demodulation algorithms [7], [8]. Additionally, it can be accounted for non-white and also non-Gaussian noise. Finally, methods from machine learning can be used to learn the impairments from the observed data and build a probabilistic model of the impairment. [7], [9]. In this paper, a framework of Bayesian filtering in combination with expectation maximization is presented and applied to accurately characterize laser amplitude and phase noise. Specifically, it is investigated how Bayesian filtering can be used to mitigate the limitations imposed by the measurement noise and obtain accurate amplitude and phase noise estimation. It is proposed to use a joint amplitude and phase noise MAP estimation which is obtained by recursively by computing the posterior probability. Three special cases of Bayesian filtering are considered: particle filtering, extended Kalman filter (EKF) and extended Kalman smoother (EKS) [10]. Finally, the methods based on Bayesian filtering are adopted and applied for joint carrier frequency and phase synchronization for up to 192 Gb/s PDM 64-QAM. Carrier phase estimation based on EKF has been numerically investigated in reference [11], and also very recently experimentally demonstrated for 16-QAM [12]. However, no solution has yet been presented on how to estimate Kalman filtering model parameters such as the transition-state-matrix and variance of the process noise which for the considered case is the variance of the amplitude and phase noise. These parameters are part of the initialization process of the Kalman filter. In order to have a robust performance, these parameters must be accurately estimated. In this paper, we perform ML transition-state-matrix and process noise variance estimation by employing expectation maximization within the Bayesian filtering framework. The remainder of this paper is organized as follows. In Section II, an introduction to the main concepts behind Bayesian

0733-8724 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

1334

JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 33, NO. 7, APRIL 1, 2015

filtering are presented. We introduce a probabilistic state-space model (P-SSM) and show that Kalman filtering framework is a special case of Bayesian filtering. The limitations of the Kalman filtering framework are pointed out, and in order to overcome those limitations particle filtering framework is introduced. The particle filtering framework relies on Monte Carlo approach where the estimated means are approximated using the concept of importance sampling [7]. It is also shown how to perform discretization of the state-space model (SSM) using stochastic Euler scheme and thereby obtain the SSM for joint tracking of amplitude, phase and frequency. The section is ended by demonstrating how expectation maximization can be employed for parameter estimation of the SSM. In Section III, it is demonstrated, experimentally and by numerical simulations, that Bayesian filtering approach is a powerful tool for estimating laser phase noise even in the presence of low signal-to-noise ratio (SNR). We also demonstrate joint tracking of amplitude and phase noise. Finally, carrier synchronization using Bayesian framework and expectation maximization based parameter estimation is demonstrated experimentally for polarization multiplexed 64-QAM signals and compared to blind phase search (BPS). The conclusions are presented in Section IV.

in optical communication system it is related to noise generated due to optical amplification. In general, it is more convenient to express the SSM in equations (1) and (2) in terms of the P-SSM. The justification is that P-SSM is more suitable for the Bayesian approach, especially for nonlinear SSMs. Equations (1) and (2) can be expressed in terms of P-SSM as follows: x0 ∼ p(x0 )

(3)

xk ∼ p(xk |xk −1 )

(4)

yk ∼ p(yk |xk ).

(5)

The stochastic dynamics of the state vector xk are characterized by p(xk |xk −1 ) which describes the transition probability associated with the uncertainties of the dynamical model and it is governed by the statistics of the process noise vk . The prior of the state vector at time instant k = 0 is specified by p(x0 ). It is assumed that the states form a Markov model for which the joint probability density function is expressed as [7]: p(x0 , . . . , xk ) = p(x0 )

k 

p(xl |xl−1 )

(6)

l=1

II. INTRODUCTION TO BAYESIAN FILTERING

and where the conditional distribution is given by:

In this section, the main concepts and methods behind Bayesian filtering are introduced. We provide a summary of Bayesian filtering methods for amplitude and phase noise characterization with the aim of aiding the reader without background in Bayesian filtering. For more detailed treatment of Bayesian filtering, see references [7], [13] and [14]. We will first introduce SSM representation and show how it can be used to describe the optical fibre communication channel and perform parameter estimation. In this paper, only sampled systems are considered, which means that the SSM is therefore specified at a discrete time instant k ∈ N. Taking into account interactions between optical fibre channel impairments and noise along the channel, the optical fibre communication channel can be considered as a nonlinear dynamical system. The SSM offers a general and very powerful tool to learn, model, and analyze nonlinear dynamical systems. The SSM consists of the following equations: xk = f (xk −1 , vk −1 )

(1)

yk = g(xk , nk )

(2)

where f and g are possibly a nonlinear mapping functions. The variable xk ∈ Rn are the states (hidden or non-observable) variable of the dynamical system that we want to estimate and n is the dimension of the state vector. The state xk can represent various dynamical parameters of the system that we want to estimate such as: transmitted data sequence, amplitude noise, phase noise, equalization enhanced phase noise, polarization mode dispersion and nonlinear phase noise. The variable vk is the process noise or uncertainty associated with the state vector. The variable yk ∈ Rm represents measurement/observable variables which are the samples after analog-to-digital converter in coherent optical systems, and m is the dimension of the measurement vector. Finally, variable nk is a measurement noise and

p(xk |x1 , . . . , xk −1 ) = p(xk |xk −1 )

(7)

which means that the current state xk given xk −1 is independent on what has happened before the time instant k − 1. The variables yk ∈ Rm represents observable variables that we can measure. The measured data is characterized by the conditional probability density function p(yk |xk ), and is conditionally independent of the measurement and state at time before k: p(yk |x1:k , y1:k ) = p(yk |xk ).

(8)

The central idea in Bayesian filtering is to compute the mean, mk of the posterior distribution, p(xk |y1:k ), of the state vector xk given the measurement up to time step k, i.e., y1:k . The computed mean, mk corresponds to the minimum mean squared error (MSE) estimate of the state xk [14]. That is why Bayesian filtering is commonly referred as optimal filtering (in MSE sense). The mean values of the state vector is computed as follows:  (9) mk = E[xk ] = xk p(xk |y1:k )dxk . In addition, we are also interested in the covariance expressed as: Pk = E[(xk − mk )(xk − mk )H ]  = (xk − mk )(xk − mk )H p(xk |y1:k )dxk

(10)

where H denotes Hermitian transposition (the operation of transposition combined with complex conjugation). The mean of the states can be computed recursively by first computing the predicted distribution p(xk |y1:k −1 ), i.e., we are predicting the distribution for a time instant k while we only have the

ZIBAR et al.: APPLICATION OF MACHINE LEARNING TECHNIQUES FOR AMPLITUDE AND PHASE NOISE CHARACTERIZATION

1335

as:

Fig. 1. Illustration of Bayesian filtering for tracking of dynamic state (phase noise). Evolution of the phase noise as a function of time.

measurement up to the time instant k − 1:  p(xk |y1:k −1 ) = p(xk |xk −1 )p(xk −1 |y1:k −1 )dxk −1 . (11) Once the new measurements yk arrive at time instant k, the posterior distribution is updated: p(xk |y1:k ) = 

p(yk |xk )p(xk |y1:k ) . p(yk |xk )p(xk |y1:k )dxk

(12)

It should be noted that the denominator in equation (12) can be considered as the normalization constant. Equations (9)–(12) constitute the general framework of the Bayesian filtering. Transition probability, p(xk |xk −1 ), and conditional distribution of measurement, p(yk |xk ), enter the equations for Bayesian filtering and therefore it is convenient to express the dynamics of the system in terms of the P-SSM. This is especially valid in the context of particle filtering where the mean mk is directly proportional to p(yk |xk ) and the state are drawn from the transition probability p(xk |xk −1 ). Fig. 1 illustrates an example of Bayesian filtering to track the dynamic state vector xk , which for the considered case can represent phase noise. In Fig. 1, the evolution of the dynamic state x(t) is shown as a function of time. The red curve illustrates the true trajectory of the dynamic state and the red circle illustrates the true value, xk , of the state at successive discrete time steps. The blue circles indicate noisy observation yk . The task is to estimate the true value, xk , at successive time steps from the noisy observations yk . It can be observed that the noisy observations, yk , are relatively far away from the true values. The green circles are mean values, mk of the true state vector, xk , obtained using the posterior distribution of the states p(xk |y1:k ). The open green circle is a covariance inferred from the posterior distribution. It can be observed that the solid green circles are very close to the true positions xk as the noise has been filtered out; hence the estimation error is low. A. Kalman Filtering Framework Under certain condition, there is an analytical solution to the Bayesian filtering equations (9)–(12). This is valid for the case when state-space functions g and f are linear, and process, vk , and measurement, nk , noise are additive and have Gaussian distribution. The corresponding linear SSM is then expressed

xk = Ak xk −1 + vk −1

(13)

yk = Hk xk + nk

(14)

where nk ∼ N (0, Σk ), vk ∼ N (0, Qk ). N (·) denotes Gaussian probability density function. The covariance matrices of measurement and process noise are Σk and Qk . Ak is the transition state matrix and expresses how components of the state vector xk are related for the discrete time instant k and k − 1, and also how different components of the state vector interrelate to each other. Hk is the measurement model matrix and expresses how components of the state vector xk are related to the components of the measurement vector yk . The quantity that is to be estimated is the state vector xk . The P-SSM is then expressed as: p(xk |xk −1 ) = N (xk |Ak xk −1 , Qk )

(15)

p(yk |xk ) = N (yk |Hk xk , Σk ).

(16)

The solution to the linear Bayesian filtering problem is reduced to Kalman filter equations and is expressed as [13]: p(xk |y1:k ) = N (xk |mk , Pk )

(17)

where mk is the mean of the posterior distribution and can be computed using the following Kalman filtering equations consisting of the prediction step, mk −1 = E[xk |y1:k −1 ] and update step mk = E[xk |y1:k ] expressed as [14]: mpk = Amk −1 Ppk

=

(18)

Ak −1 Pk −1 ATk −1

ek = yk − Hk mpk Sk = Kk = mk =

Hk Ppk HTk + Ppk HTk S−1 k mpk + Kk ek

+ Qk −1

(19) (20)

Σk

Pk = Ppk − Kk Sk KTk .

(21) (22) (23) (24)

The core of Kalman filtering equations (18)–(24) is as follows: first the prediction of the mean, mpk , for the time instant k, while only having available measurements up to k − 1, is performed, then new measurement data yk at time instant k is acquired. The error, ek is then computed between the “true” and the predicted measurement data, yk and Hk mpk , respectively. The error is then scaled by the Kalman gain, Kk , and used to update the mean, mk for the discrete time instant k. The Kalman gain, Kk , expresses how much the predicted mean value, mpk , needs to be changed given the measurement yk . The variable Sk represents the estimated covariance matrix of the measurement yk . When the measurement covariance matrix Sk is large this implies that the Kalman gain is small and the updated mean value, mk , is mostly determined by the predicted value mpk . On the other hand when the estimated measurement matrix Sk is low, that means that we can have confidence in the measurements and the updated mean value, mk , is then dominated by the measurement data. The variable Ppk and Pk are the estimated covariance matrices of the states and express the amount of variation of the

1336

JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 33, NO. 7, APRIL 1, 2015

states xk . If the estimated state covariance matrix Ppk is small that implies the state vector xk does not change much and the Kalman gain is small. For that case, the difference between the mean at time instant k and k − 1 is small as expected. Once we start deviating from the linear SSM, we can no longer provide an analytical solution for the Bayesian filtering equations. In this paper, we will consider both amplitude and phase noise fluctuations, and this results in the nonlinear measurement equation for yk . The state vector xk will therefore consist of a ph vector representing amplitude, xam k , and phase, xk , noise flucph T am tuation, i.e., xk = [xk , xk ] , where T represents transpose operation. The SSM is then expressed as: xk = Ak xk −1 + vk −1 ix pk h

yk = (1 + xam k )e

+ nk .

p(xk |y1:k ) ≈ N (xk |mk , Pk ).

g(x)  g(mk ) + Gx δx

(27)

(28)

ph

ix k and G = ∂g(x)/∂x|x = m, where g(x) = (1 + xam k )e and δx ∼ N (0, Pk ). The filtering equation for the EKF are then expressed as [14]:

= Amk −1

(29)

= Ak −1 Pk −1 ATk −1 + Qk −1

(30)

ek = yk − g(mpk ) Sk = Kk = mk = Pk =

mpk + Kk ek Ppk − Kk Sk KTk

(31) + Σk

(32) (33) (34)

.

The main idea behind particle filtering is to represent the posterior filtering densities, p(xk |y1:k ) with a set of weighted samples. The weighted samples are then used to compute the posterior mean and covariances. The starting point is that by using Monte Carlo approach, the mean value of the states xk , given by equation (9), can be approximated as: E[xk ] ≈

(26)

To obtain the approximated filtering density in equation (27), the nonlinear measurement equation has been approximated by the first order Taylor expansion:

G(mpk )k Ppk G(mpk )Tk Ppk G(mpk )Tk S−1 k

B. Particle Filtering

(25)

Using the Bayesian filtering, we can then jointly track both amplitude and phase noise fluctuation and compute the correspondph T ing mean of of the posterior distribution mk = [mam k , mk ] . As the noise terms are still assumed to be additive Gaussian, the P-SSM for (25), (26) is expressed as in equations (15) and (16). It should be noted that the relationship between the measurement data yk and the states xk is not linear because yk i(·) is now related to xph . The k through the nolinear function e Kalman filtering equations (18)–(24) cannot therefore not be directly applied. Instead, approximation methods such as EKF, approximate grid methods and particle filtering must be used [13]. In this paper, we will focus on EKF methods and particle filtering. As the SSM is nonlinear, EKF relies on approximating the filtering densities [14]:

mpk Ωpk

importance sampling and Monte Carlo methods to recursively compute the filtering densities and it does not imposes any constraints on the SSM functions, f and g, nor noise distributions nk and vk .

(35)

We should keep in mind that the EKF deviates from the correct solution if the function g(x) is highly nonlinear and first order Taylor approximation is not adequate to describe the function. In that case, sequential importance sampling particle filter is an alternative method to compute the filtering densities. It uses

N 1  (i) x N i=1 k

(36)

where N is an integer and x(i) ∼ p(xk |y1:k ), i = 1, . . . , N . In many cases it is not possible to draw samples from the density p(xk |y1:k ) and as an alternative we can use an approximate distribution π(xk |y1:k ) (importance distribution) from which it is easy to draw samples from. Typically, the relationship between the ”true” and importance distribution is expressed as follows: p(xk |y1:k ) = 1/Zn π(x|y1:k ) where Zn is an unknown normalization constant. Using the importance sampling, the filtering density is then expressed as:  xk p(xk |y1:k )dxk (37) E[xk ] =   =

 xk p(xk |y1:k ) π(xk |y1:k )dxk π(xk |y1:k )

N N 1  xik p(xik |y1:k ) 1  (i) i ≈ w x = N i=1 π(xik |y1:k ) N i=1 k k

(38)

where xik ∼ π(xk |y1:k ) and w(i) are the so called importance weights. In order to recursively compute the filtering densities sequential version of importance sampling is needed and there accordingly the filtering distribution can be approximated as [13]: p(xk |y1:k ) ≈

M 

(i)

wk δ(xk − xik )

(39)

i=1

where M represents the number of weights/particles and as M → ∞, equation (39) converges to the true posterior distribution. The importance weights are computed as following [13]:    (i) (i) (i) (i) p yk |xk p xk |xk −1 p x0:k −1 |yk −1 (i)  wk ∝ (40) (i) π x0:k |y1:k (i)

where the samples x0:k are drawn from the importance distri(i) bution π(x0:k |y1:k ). (i) To compute the weights wk recursively, we need determine the relationship between weights at discrete time instant (i) (i) k and k − 1, i.e., wk and wk −1 . If the importance distribution π(x0:k |y1:k ) is factorized, the expression for the weights at

ZIBAR et al.: APPLICATION OF MACHINE LEARNING TECHNIQUES FOR AMPLITUDE AND PHASE NOISE CHARACTERIZATION

discrete time instant k becomes:    (i) (i) (i) (i) p yk |xk p xk |xk −1 p x0:k −1 |y1:k −1 (i) ·  .  wk ∝ (i) (i) π xk |x0:k −1 , y1:k π x0:k −1 |y1:k −1



(i)

w k −1

(41) It should be noted that as the weights represent probability distribution and therefore must be normalized to sum to one (i) (i)  (j ) as following: wk = wk / j wk . It is also convenient to assume that the importance density has Markovian properties such that:   (i) (i) (42) π xk |x0:k −1 y1:k = π xk |xk −1 , y1:k . The requirements on the importance density is that it should be easy to draw samples from it. The optimal importance density (i) is given by: π(xk |xk −1 , y1:k ) = p(xk |xk , yk ) [13]. However, in many practical cases it is difficult to determine the optimal importance density and it is therefore very common to use the transition probability as the importance density:  (i) (43) π xk |x0:k −1 y1:k = p (xk |xk −1 ) . In this paper, we will use transition probability as the importance density. One issue with particle filtering is that the weights of all particles except a few will approach zero. This is commonly referred as sample depletion [13]. A solution to this problem is to introduce resampling. In order to determine if resampling is needed, a following metric is used: Meff = M

1

(i) 2 i=1 (wk )

.

(44)

The resampling is then performed if Meff is below a certain threshold and is typically a percentage of the total number of particles used. In this paper, we choose it to be M/2. There are many algorithms which can be employed to do the resampling. The resampling method that is very commonly used and also employed throughout this paper is based on multinominal sampling [13]. The basic idea behind the multinomial sampling is to (i) (i) (i) determine the mapping from x0:k , wk to x0:k , 1/M . This can be done by employing the following algorithm [13]: M −1  j =1

(j )

wk < u(i)

Suggest Documents