Performance of Generalized Regression Neural Network-Based ...

Performance of Generalized Regression Neural Network-Based Channel Estimation in Vectored DSL Systems Sean Huberman and Tho Le-Ngoc Department of Electrical and Computer Engineering, McGill University, Montreal, Quebec, Canada Email: [email protected] ; [email protected]

Abstract—It is well-known that Vectored Digital Subscriber Line (DSL) transmission promises significant theoretical datarate increases for DSL technology; however, Vectored DSL requires full knowledge of the channel. The effectiveness of Vectored DSL transmission in a practical setting, where channel knowledge is subject to error, has yet to be determined. This paper proposes a Generalized Regression Neural Network (GRNN)-based approach to DSL channel estimation by interpolating between a subset of measured or estimated datapoints. Furthermore, closed-form expressions for the effect of channel estimation error on the achievable Vectored DSL datarate are derived, using a Zero-Forcing (ZF) interference canceller for upstream transmission and a Diagonalizing Pre-coder (DP) for downstream transmission. Finally, simulation results are provided to demonstrate the performance loss associated with channel estimation error for Vectored DSL transmission, based on the ANN approach and a linear regression approach. Index Terms—Digital Subscriber Line (DSL), Vectored DSL, Channel Estimation, Generalized Regression Neural Network

I. I NTRODUCTION The main performance degradation in Digital Subscriber Line (DSL) systems is the interference between lines, known as crosstalk. In order to provide content-rich services, DSL service providers require higher data-rates. Dynamic Spectrum Management (DSM) [1] is a family of technologies which aim to mitigate the effects of crosstalk on the performance of DSL systems. DSM level-3 [2] refers to a special class of DSM technologies known as Vectored DSL, which aims to entirely remove the crosstalk using pre-coding and/or interference cancellation. Vectored DSL has been shown to provide significant performance benefits [2]; however, it requires full knowledge of the channel (i.e., direct lines and crosstalk). The DSL environment is slow time-varying, and as such, lends itself very well to channel estimation techniques. DSL technology uses a Discrete Multi-Tone (DMT) transmission scheme whereby the frequency spectrum is divided into many frequency tones (or sub-carriers). DMT mainly differs from Orthogonal Frequency Division Multiplexing (OFDM) in that each tone is modulated independently. DSL channel estimation on a point-by-point basis is a well-researched parameter estimation problem. In [3], point-wise crosstalk was estimated using a Least-Squares (LS) method by making use of an impartial third-party. A

Maximum-Likelihood (ML) channel estimation technique using a training-aided expectation-maximization algorithm was proposed in [4] and [5]. A crosstalk channel estimation technique based on Signal-to-Noise Ratio (SNR) measurements at the receiver was presented in [6]. An optimal crosstalk channel estimation technique in the LS sense was proposed in [7]. As well, a ML channel estimator was derived in [7]. Finally, the effect of channel estimation error on the performance of nonvectored DSM systems was discussed in [8]. The above channel estimation techniques provide various methods for approximating a particular point on the direct and crosstalk channels; however, in order to estimate the full channel, these techniques require estimating a data-point for each frequency tone. The number of frequency tones varies depending on the type of DSL technology implemented. In particular, current Very high bit rate DSL (VDSL) technology uses 4096 frequency tones. Hence, it is required to estimate the N × N × K channels, where N is the number of lines and K is the number of frequency tones. For practical Vectored DSL systems, K = 4096, and N typically ranges between 25 and 192. As such, the number of parameters required to be estimated can be as many as 151 million. While the DSL channel is slow time-varying, it is still a delay-sensitive technology. As such, applying the above channel estimation techniques to estimate 151 million parameters would be very time-consuming. It is worthwhile to note that the channel estimators derived above were typically designed for Asymmetric DSL (ADSL), which uses only 256 frequency tones, requiring much fewer parameters to be estimated. This paper proposes the use of Artificial Neural Networks (ANNs) as nonlinear adaptive curve fitters (e.g., see Fig. 1), to reduce the number of measurement-based parameter estimates. The ANN is used to interpolate between sampled data-points, which can be obtained from measured results or parameter estimation techniques. The Generalized Regression Neural Network (GRNN) [9] uses a subset of sample data points (training set) to generate the complete estimated direct and crosstalk curves. For each of the N × N curves, the training set consists of a subset of the K tones, which can be found using channel measurements and/or a parameter estimation techniques. More generally, ANNs consist of several layers: an input layer, an output layer

II. S YSTEM M ODEL Consider a DSL network with a set of users (modems) N = {1, . . . , N } and a set of frequency tones K = {1, . . . , K}. Using synchronous DMT modulation, Inter-Channel Interference (ICI) is removed and transmissions can be modeled independently on each tone k as: yk = Hk xk + zk , where T xk , (x1k , . . . , xN contains the transmitted signals, yk k ) and zk are similarly defined and contain the received signals and additive white Gaussian noise, respectively. As well, hn,m , [Hk ]n,m is the channel gain from user m to user k n on tone k and snk , E{|xnk |2 }/∆f is the transmit Power Spectral Density (PSD), for user n on tone k. E{·} denotes the expected value operation and ∆f denotes the frequency tone spacing. The SNR of user n on frequency tone k is defined as: SNRnk = ∑

2 n |hn,n k | sk , n,m 2 m | sk + σkn m̸=n |hk

where σkn , E{|zkn |2 }/∆f is the noise power density of user n on frequency tone k. Hence, the achievable data-rate for user n is defined as: ( ) ∑ Rn , fs log2 1 + Γ−1 SNRnk , (1) k

where Γ is the signal to noise ratio gap which is a function of the desired bit error rate, coding gain, and noise margin and fs is the DMT symbol rate. III. C HANNEL E STIMATION BASED ON A S UBSET OF DATA -P OINTS A. Problem Formulation Suppose that T = {tk }, k = 1, . . . , K are the true values of the channel to be estimated (e.g., tk , |hn,m |2 for some n and k ˜ are the measurements m) and that V = {vk˜ }, k˜ = 1, . . . , K ˜ = |V| ≪ |T | = K. taken or estimated. Hence, V ⊂ T and K Let v¯ = (v1 , . . . , vK˜ )T and let t¯ = (t1 , . . . , tK )T . Then, the estimator for tk , is given by a function of v¯ evaluated at the

frequency tone k (i.e., tˆk = fk (¯ v )). With the goal to minimize the Mean-Squared Error (MSE), the optimization problem can be expressed as follows: f¯∗ (¯ v ) = arg min ||t¯ − f¯(¯ v )||2 , (2) f¯(¯ v)

where f¯(¯ v ) = (f1 (¯ v ), . . . , fK (¯ v ))T . Optimization problem (2) is an interpolation problem; however, due to the high non˜ ≪ K, it linearity of the crosstalk (e.g., see Fig. 1) and since K is not straightforward to solve. For practical purposes, the goal is to “well-approximate” the full crosstalk (i.e., 4096 frequency tones) using as few sample points as possible (e.g., 5 or 20). One common approach to solving such optimization problems involves enforcing a linearity constraint on the set of functions {f¯(¯ v )}. Imposing this constraint converts optimization problem (2) into a linear regression problem and thus, it can be solved for straightforwardly. Section III-C proposes and discusses the use of GRNN as a method for solving the highly non-linear interpolation problem in a computationally efficient manner (i.e., solving for the function f¯∗ (¯ v )). For reference, the performance of the GRNN is compared with that of linear regression in Section V. −85 Measured Data ANN wih 5 Training Points ANN with 20 Training Points ANN with 75 Training Points

−90

−95 Magnitude (dB)

and hidden layers. Each hidden layer contains a set of neurons (or nodes) and the path from one layer to the next is typically scaled by a corresponding weight vector. The ANN trains (or optimizes) the weighting values of every path in the network using a training set. In general, the more neurons and the larger the training set, the more accurate the model at the expense of computational time. This paper uses a GRNN, whose architecture lends itself very well to the problem of function interpolation, particularly for crosstalk. The specific architecture of the GRNN and a discussion on the DSL channel characteristics for a GRNN are discussed in Section III. The rest of this paper is organized as follows. Section II introduces the DSL system model. Section III presents the GRNN approach to DSL channel estimation. Section IV derives closed-form expressions for the effect of channel estimation error for practical upstream and downstream Vectored DSL transmission. Finally, Section V presents some simulation results and Section VI provides some concluding remarks.

−100

−105

−110

−115

−120

0

2

4

6

8 10 Frequency (MHz)

12

14

16

18

Fig. 1. Sample ANN approximations of crosstalk from line 10 to line 6 of the 25-user 500-m channel measurements (4096 data-points), discussed in Section V.

B. Characteristics of DSL Channels Generally, DSL direct channels are smooth, monotonically decreasing functions and fairly easy to approximate. The difficulty in DSL channel approximation is with respect to the crosstalk channels. DSL crosstalk frequency responses are typically very curvy, often containing manly local minima and maxima (e.g., see Fig. 1). As such, they can often be best interpolated by a curvy interpolation function. Several ANN architectures were tested, including Feed-Forward Neural Networks and Radial-Basis Neural Networks; however, it was found that a GRNN best approximated the curvature of the crosstalk frequency response and hence, led to the best performance. C. Generalized Regression Neural Networks GRNNs make use of Radial Basis Functions (RBFs) when generating estimated curves. A RBF is a function which is

symmetrical about a center point. The RBF used in this paper is f (x) = exp (−x2 ). Measured crosstalk transfer function data exhibits many local minima and maxima (e.g., see Fig. 1); hence, the curvy nature of this RBF allows for the ANN to more accurately capture the curvature of the crosstalk. The architecture of the GRNN is shown in Fig. 2. The GRNN consists of two hidden layers: a radial basis layer and a specialized linear layer. The radial basis layer works by taking

at the CO, and therefore the received signal can be processed in order to remove the crosstalk component. Theoretically, in both cases, once the crosstalk is removed, each line can transmit at its interference-free data-rate. In this paper, the interference-cancellation uses Zero-Forcing (ZF) [11] and the pre-coding uses a Diagonalizing Pre-coder (DP) [12]. A. Vectored DSL with Perfect Channel Knowledge Ideal Vectored DSL transmission assumes perfect knowledge of the channel. Sections IV-A1 and IV-A2 briefly discuss ideal upstream and downstream transmission, respectfully. 1) Upstream: Zero Forcing with Perfect Channel Knowledge: The received vector is ideally processed using a ZF interference canceller [11], as follows: −1 ˆ k = H−1 x k yk = xk + Hk zk .

Fig. 2.

GRNN architecture

weights, which can be thought of as representing the mean of the input, subtracted from the inputs, followed by applying the RBF [10]. Thus, when an input value is very close to the mean value, the value of the difference is very close to zero. Based on the RBF, this will assign a value very close to one at the output of the first layer. Similarly, when an input point is very far from the mean value, the difference is far from zero. Based on the RBF, this will assign a value very close to zero at the output of the first layer. Hence, the first layer weights points which are closer to the mean higher than those which are further from the mean. This makes the ANN less sensitive to outlier points, which can occur frequently in measured crosstalk data (e.g., if a point near one of the local minima of Fig. 1 is sampled). Hence, the i-th element of vector a ¯ is given by: { ( )2 } a ¯i = exp − ¯bi ||¯ p − [IW]row i || . The second hidden layer corresponds to a specialized linear layer. It begins by computing the dot product between a row of the weight matrix LW, and the input vector a ¯, normalized by the sum of the elements in a ¯. This causes the network to work as a weighted average between target values, whose input values are close to the trained values [10]. Hence, the i-th element of the vector y¯ is given by: y¯i =

[LW]row i · a ¯ ∑ . a ¯ i i

The number of neurons used in both hidden layers, Q, was set as the number of training data points, R. IV. V ECTORED DSL T RANSMISSION Vectored transmission DSL [2] uses signal co-ordination between users to cancel the effects of crosstalk. For downstream transmission, all the transmitters are co-located at the Central Office (CO), and therefore the transmitted signals can be predistorted so that the signals arriving at the users are crosstalkfree. For upstream transmission, all the receivers are co-located

Therefore, ideally the crosstalk is removed entirely. It is shown in [11] that using ZF ideally entirely removes the crosstalk at the expense of increasing the noise. 2) Downstream: Diagonalizing Pre-Coder with Perfect Channel Knowledge: A pre-coding matrix, Qk , is introduced and the transmission is re-written as: ˜ k + zk = Hk (Qk xk ) + zk , yk = Hk x 1,1 N,N where Qk = βk−1 H−1 } and k diag{hk , . . . , hk diag{a1 , . . . , aN } denotes the matrix whose diagonal elements are a1 , . . . , aN and every other element is zero [12]. As well, the scaling factor is defined as 1,1 N,N βk = maxn ||[H−1 }]row n || [12]. k diag{hk , . . . , hk The scaling factor, βk , is used to ensure compliance with the spectral mask constraint after pre-coding. The DP essentially “diagonalizes” the channel matrix allowing each user to transmit crosstalk-free; however, each users’ direct channel is scaled down by βk . It is shown in [12] that βk ≈ 1, and therefore, ideally, the DP allows for each user to approximately reach their respective single-user transmission capacity.

B. Vectored DSL with Imperfect Channel Knowledge The vectored DSL transmission techniques discussed in Section IV-A assume perfect knowledge of the channel. In practice, the channel is estimated and will have an associated estimation error. The effect of estimation error on vectored DSL is investigated in this section. 1) Upstream: Zero Forcing with Imperfect Channel Knowledge: Upstream vectored transmission with an estimated chanˆ k , is given by (3). nel matrix, H ˆ −1 yk ˆk = H x k ˆ −1 (Hk xk + zk ) =H k

ˆ −1 Hk xk + H ˆ −1 zk =H k k

(3)

ˆk = The estimated channel matrix can be written as H Hk + ∆Hk , where ∆Hk represents the measurement error. ˆ k can be written as [13]: The inverse of H ˆ −1 = (Hk + ∆Hk )−1 H k ( )−1 −1 = H−1 I + ∆Hk H−1 ∆Hk H−1 k − Hk k k .

(4)

Substituting (4) into (3) gives:

V. N UMERICAL R ESULTS

( ) ˆ −1 zk − H−1 I + ∆Hk H−1 −1 ∆Hk xk ˆ k = xk + H x k k k | {z } (

(EU S )k

)

= I − (EU S )k xk +

ˆ −1 zk , H k

(5)

where (EU S )k is an N × N matrix representing the error in estimation associated with upstream transmission on frequency tone k. Therefore, the upstream SNR for line n on frequency tone k can be expressed as: [ ] 2 1 − (EU S )k n,n snk SNRnk = ∑ [ 2 . ] 2 m [ −1 ] n ˆ (E ) s + H σk U S k m̸=n k k n,m row n (6) The corresponding achievable data-rate can be easily found by combining (1) and (6). It can be seen that as ∆Hk approaches the all-zeros matrix, (EU S )k vanishes and the SNR approaches that of ideal channel knowledge:

Channel measurements are used to evaluate the effect of ANN channel estimation error on a 25-line 500-m DSL system of 26-AWG cables. The 500-m measurements were taken from 0 to 30 MHz with a spacing of 8 × 4.3125 kHz. For practical DSL systems, only 1147 upstream and 2285 downstream frequency tones were used (varying from 276 kHz to 17.6 MHz with a tone-spacing of 4.3125 kHz) [14]. The measured data was interpolated in order to more accurately represent the frequency tones used in practical DSL systems. A. Performance with Imperfect Channel Knowledge As discussed in Section IV, a ZF interference canceller is applied in the upstream and a DP is applied in the downstream. For both upstream and downstream transmission, a flat PSD is applied for each line. The training data was constructed by uniformly selecting the desired number of data-points from the true measured values. The performance of the GRNN and linear regression methods are compared in Fig. 3 for varied numbers of training (sampled) points. Fig. 3 shows that

snk SNR[Ideal]nk = [ . −1 ] 2 n Hk σk

1

R (estimate) / R (ideal)

0.95

2) Downstream: Diagonalizing Pre-Coder with Imperfect Channel Knowledge: Downstream vectored transmission with an estimated channel matrix is given by (7). ( ) ˆ 1,1 , . . . , h ˆ N,N }xk + zk . ˆ −1 diag{h y ˆk = Hk βk−1 H (7) k k k Similar to Section IV-B1, substituting (4) into (7) gives:

=

0.85

Ideal Channel Knowledge GRNN Upstream GRNN Downstream Regression Upstream Regression Downstream

0.8

ˆ 1,1 , . . . , h ˆ N,N }xk − y ˆk = zk + βk−1 diag{h k k ( ) −1 −1 −1 ˆ 1,1 , . . . , h ˆ N,N } xk βk I + ∆Hk Hk ∆Hk H−1 diag{h k k {z k } | (

0.9

0.75

10

20

30 40 50 Number of Training Points

60

70

(EDS )k

ˆ 1,1 , . . . , h ˆ N,N } βk−1 diag{h k k

) − (EDS )k xk + zk ,

(8)

where (EDS )k is an N × N matrix representing the error in estimation associated with downstream transmission on frequency tone k. Therefore, the downstream SNR for line n on frequency tone k can be expressed as: ] 2 n −1 ˆ n,n [ s)k βk hk − (EDS )k n,n (˜ SNRnk = ∑ , (9) [ ] 2 m n (˜ s ) (E ) + σ DS k m̸=n k k n,m 2 } { [ ] where (˜ s)nk = E Qk row n xk /∆f . As with upstream transmission, the corresponding achievable data-rate can be easily found by combining (1) and (9). Similar to the upstream case, as ∆Hk approaches the all-zeros matrix, (EDS )k vanishes and the SNR approaches that of ideal knowledge: SNR[Ideal]nk =

2 βk−2 |hn,n s)nk k | (˜ . σkn

Fig. 3.

Effect of channel estimation for varied number of training points.

the GRNN is capable of approaching the performance with ideal channel knowledge as the number of data-points taken increases. More specifically, in order to achieve over 95% of the ideal channel knowledge case, as few as 20 training points were needed for both upstream and downstream transmission. As well, Fig. 4 shows the average MSE (computed in dB) between the channel estimates and the true measured values for various number of training points. The average MSE achieved by the estimation techniques are related to how effectively optimization problem (2) is solved. As expected, as the number of training points increase, the average MSE can be made arbitrarily close to zero. It is interesting to note that when five training points were used, the average MSE for the GRNN and the linear regression were nearly identical; however, the corresponding percent of ideal channel knowledge differed by approximately 5% with downstream transmission. This can be due to the fact that while the average MSE was almost identical, the GRNN was capable of capturing the curvature of

50 48 46 Average Run−time (ms)

the crosstalk (e.g., see Fig. 1), resulting in better performance. When five training points were used, the linear regression approach provided a slightly lower average MSE. Hence, when five training points were used, the linear regression approach provided a slightly higher percent of ideal channel knowledge than the GRNN approach for upstream transmission; however, the GRNN gain in downstream transmission is significantly larger than the upstream transmission loss with respect to the linear regression approach.

44 42 40 38 36

20 GRNN Upstream GRNN Downstream Regression Upstream Regression Downstream

18 16

34 32

10

Average MSE (dB)

14

20


60

70

12

Fig. 5.

Average run-time for the GRNN in milliseconds.

10 8

is that the performance loss associated with imperfect channel knowledge is negligible with respect to the performance benefits associated with Vectored DSL.

6 4 2 0

Fig. 4.

R EFERENCES 10

20


60

70

Average MSE (dB) for varied number of training points.

B. GRNN Run-Time Results One set of direct or crosstalk channel measurements over 4096 frequency tones is on the order of minutes using a typical VDSL2 DSLAM [1]. Hence, the time required to compute each data-point is on the order of milliseconds. Thus, reducing the number of data-points taken can significantly reduce the amount of time required to gain channel knowledge, provided the run-time of the GRNN is negligible. Fig. 5 shows that the run-time of GRNN is linear in the number of training points used and on the order of milliseconds. In particular, sampling the channel measurements and applying a GRNN can reduce the time required to obtain one set of the N × N channel measurements from minutes to milliseconds. VI. C ONCLUSION This paper investigated the performance of Vectored DSL under imperfect channel knowledge and proposed a GRNNbased technique for DSL channel estimation. As well, closedform expressions for the effect of channel estimation error on the achievable data-rate were derived for upstream and downstream transmission when a ZF interference canceler and a DP are used, respectively. The performance of the GRNN-based channel estimation was compared to that of linear regression. It was shown that a GRNN could achieve over 95% of the ideal channel knowledge case using as few as 20 (out of 4096) data-points. Furthermore, it was shown that a GRNN could achieve arbitrary close performance to that of ideal channel knowledge as the number of data-points used increases. The key implication of this paper

[1] S. Huberman, C. Leung, and T. Le-Ngoc, “Dynamic Spectrum Management (DSM) Algorithms for Multi-User xDSL,” IEEE Commun. Surveys Tuts., in press. [2] G. Ginis and J. Cioffi, “Vectored transmission for digital subscriber line systems,” IEEE J. Sel. Areas Commun., vol. 20, no. 5, pp. 1085 –1104, Jun. 2002. [3] C. Zeng, C. Aldana, A. Salvekar, and J. Cioffi, “Crosstalk identification in xdsl systems,” IEEE J. Sel. Areas Commun., vol. 19, no. 8, pp. 1488– 1496, Aug. 2001. [4] C. H. Aldana and J. Cioffi, “Channel tracking for multiple input, single output systems using em algorithm,” in Proceedings of the 2001 IEEE international conference on Communications, ser. ICC’01. IEEE Press, Jun. 2001, pp. 586–590. [5] E. d. C. C. H. Aldana and J. M. Cioffi, “Channel estimation for multicarrier multiple input single output systems using the em algorithm,” IEEE Trans. Signal Process., vol. 51, no. 12, pp. 3280–3292, 2003. [6] J. Louveaux, A. Kalakech, M. Guenach, J. Maes, M. Peeters, and L. Vandendorpe, “An snr-assisted crosstalk channel estimation technique,” in Communications, 2009. ICC ’09. IEEE International Conference on, june 2009, pp. 1–5. ¨ [7] F. Lindqvist, N. Lindqvist, B. Dortschy, P. Odling, P. O. B¨orjesson, K. Ericson, and E. Pelaes, “Crosstalk channel estimation via standardized two-port measurements,” EURASIP J. Adv. Signal Process, vol. 2008, pp. 201:1–201:14, Jan. 2008. [8] N. Lindqvist, F. Lindqvist, M. Monteiro, B. Dortschy, E. Pelaes, and A. Klautau, “Impact of crosstalk channel estimation on the dsm performance for dsl networks,” EURASIP J. Adv. Signal Process, vol. 2010, pp. 2:1–2:11, Feb. 2010. [9] D. F. Specht, “A general regression neural network,” IEEE Trans. Neural Netw., vol. 2, no. 6, pp. 568–576, Nov. 1991. [10] H. Demuth, M. Beale, and M. Hagan, “Neural network toolbox 6 users guide.” The MathWorks, Inc, 2008. [11] R. Cendrillon, G. Ginis, E. Van den Bogaert, and M. Moonen, “A nearoptimal linear crosstalk canceler for upstream vdsl,” IEEE Trans. Signal Process., vol. 54, no. 8, pp. 3136–3146, Aug. 2006. [12] ——, “A near-optimal linear crosstalk precoder for downstream vdsl,” IEEE Trans. Commun., vol. 55, no. 5, pp. 860 –863, May 2007. [13] H. V. Henderson and S. R. Searle, “On deriving the inverse of a sum of matrices,” SIAM Review, vol. 23, no. 1, pp. pp. 53–60, January 1981. [14] “Very high speed digital subscriber line transceivers 2 (VDSL2),” ITU Std., vol. G.993.2 Amendment 1, 2007.

Performance of Generalized Regression Neural Network-Based ...

Performance of Generalized Regression Neural Network-Based ...

Suggest Documents