GENERALIZED FLEXIBLE SPLITTING FUNCTION OUTPERFORMS CLASSICAL APPROACHES IN BLIND SIGNAL SEPARATION OF COMPLEX ENVIRONMENT Michele Scarpiniti, Daniele Vigliano, Raffaele Parisi and Aurelio Uncini INFOCOM Department, “Sapienza” Universit`a di Roma via Eudossiana 18, 00184 Roma, Italy phone: (+39) 0644585495, fax: (+39) 064873300, email:
[email protected] ABSTRACT This paper introduces a novel approach of Blind Separation in complex environment based on bi-dimensional flexible activation function (AF) and compares the performance of this architecture with the classical approach. The generalized complex function has been realized by a flexible bi-dimensional Spline Based approach both for the real and one for the imaginary parts, avoiding the restriction due to the Louiville’s theorem. The flexibility of the surface allows the learning of the control points using a gradient-based techniques. Some experimental results demonstrate the effectiveness of the proposed method. Index Terms— Blind signal separation, independent component analysis, complex flexible activation functions. 1. INTRODUCTION From a general point of view it is well known that complex domain neural networks are attractive for many signal processing applications [5], especially where data have a complex nature and so both amplitude and phase recovery is essential [1, 7]. One of the main issue in designing complex neural networks is the choice of the activation function. Let f (z) be the complex activation function (AF) with z ∈ C (where z = x + jy), the main properties that f (z) should satisfy are [11]: 1. f (z) = u(x, y) + jv(x, y); 2. f (z) should be non linear and bounded; 3. in order to derive the back-propagation (BP) algorithm the partial derivatives of f (z) should exist and be bounded. Unfortunately the analytic and boundedness characteristics are in contrast with the Liouville’s theorem [6], for which bounded and differentiable functions are constant functions. In other words f (z) should be defined as a nonlinear complex function that is bounded almost everywhere in the complex domain C [1, 6–8].
Recently has been presented [3, 10] a complex-valued adaptive spline network in which the splitting method has been used: complex nonlinear function has been realized by splitting the real and the imaginary part and by modeling each of them with a mono-dimentional spline based architecture. More recently in [9,11] it has been introduced a complex neural network using a bi-dimensional (2D) spline in order to define a new class of flexible AF, called generalized splitting AF, which are bounded and (locally) analytic functions, suitable to define a new class of complex domain neural network. In this paper we want to introduce a novel network architecture for the blind separation [2, 10] in complex domain, in case of linear and instantaneous mixing environment, extending the generalized splitting AF and compare it with the already widely diffused approaches. The use of this kind of bi-dimentional architecture is very attractive in every case in which there is some correlation between real and imaginary parts of signals involved in the processing; this issue is of large interest in the signal processing community.
2. THE GENERALIZED SPLITTING FUNCTION 2.1. Spline approximation neuron In the last years a great interest in the intelligent use of the activation functions has been oriented to reduce hardware complexity and improve the generalization ability. The Spline mono-dimensional neuron is a LUT-based activation function which can solve this problem [10]. The spline neuron performs the interpolation of the LUT values, which represent the samples of the function for a fixed abscissas. The interpolation scheme involves only four control points, nearest to the point to be adapted, selected on the base of the parameter i: this is the so called local adaptation; the position of the abscissa with respect to the inner interval of control points has been specified by the local parameter u [3, 10]; this scheme guarantees a continuous first derivate and the capability to locally adapt the curve. A detailed dissertation on spline activation neuron is out of the scope of this section, see [3, 9, 10] for more details.
where z = x + jy, and the two functions u(x, y) and v(x, y) called real part and imaginary part of the function, are boun∂u ded and differentiable, so we can assume ux = ∂u ∂x , uy = ∂y , ∂v ∂v and uy = ∂y . vx = ∂x In this way Georgiou & Koutsougeras [6] identified five desirable properties of a fully complex activation function: 1. (2) is nonlinear in x and y; 2. f (z) is bounded; 3. the partial derivatives ux , uy , vx and vy exist and are bounded;
Fig. 1. An example of 2D CR-spline
4. f (z) is not entire; The mono-dimentional spline neuron can be generalized to realize bi-dimensional functions as hyper-surface interpolation of control points using higher order interpolants [9]. In particular piecewise of splines are here employed in order to render the hyper-surface continuous in its partial derivatives. The entire approximation is represented through the concatenation of local functions each centered and controlled by 42 control points, which lie on a regular 2D grid in R2 , defined over the region 0 ≤ u1 , u2 ≤ 1 (see figure 1), and in matrix formulation it is expressed as follows: ˆ 1 , i2 ; u1 , u2 ) = T2 M(T1 MQ)T y = h(i
(1)
where Tk = u3k u2k uk 1 , M is a matrix that selects which basis is chosen (Catmull-Rom (CR) or B-Spline) and Q is a 2-dimensional structure called the local control points.
5. ux vy 6= vx uy . Note that by Liouville’s theorem, the second and the fourth conditions are redundant. Using this fact Kim & Adali [8] reduced the above conditions into four conditions given below: 1. (2) is nonlinear in x and y; 2. f (z) should have no singularities and be bounded for all z in a bounded set; 3. the partial derivatives ux , uy , vx and vy exist and are bounded; 4. ux vy 6= vx uy . If not, then f (z) is not a suitable activation function except in the following cases: • ux = vx = 0 and uy 6= 0, vy 6= 0; • uy = vy = 0 and ux 6= 0, vx 6= 0
2.2. The complex activation function Let f (z) be the complex activation function with z ∈ C and z = x + jy, it should satisfy the properties introduced in [11]. The third property has been introduced to grant the derivability of the back-propagation algorithm. According with the need to have complex derivative f 0 (z) of the activation function without specifying a domain, the most critical issue is the trade off between the boundedness and analyticity of the nonlinear functions in complex domain C. If the domain is taken to be C, then such functions are called entire. The problem on the existence of an activation function which is both entire and bounded is stated by the following [6] Theorem (Liouville) If f (z) is entire and bounded on the complex plane C, then f (z) is a constant function. Directly from Liouville’s theorem follows that entire functions are not suitable activation functions. Now we can consider the expression of a complex function in relation to the real and imaginary part: f (z) = f (x, y) = u(x, y) + jv(x, y)
(2)
Note that both sets of conditions above emphasize boundedness of an activation function and its partial derivatives, even when the function is defined in a local domain of interest. By Liouville’s theorem the cost of this restriction is that a bounded activation function cannot be analytic, which consequently lets the complex activation function to approximate the partial derivatives. A lot of functions proposed for complex neural networks violate the second and third boundedness requirements of both sets of conditions above for the presence of singular points, which hardly pose a problem in learning [1, 7, 8]. The last restriction ux vy 6= vx uy was originally imposed to guarantee continuous learning, but this condition is unnecessary for fully complex functions. This because the fully complex functions satisfy the Cauchy-Riemann equations, which are a necessary condition for a complex function to be analytic at a point z ∈ C [6]. Recent works [1] have shown that, for the multilayer perceptron (MLP), a complex counterpart of the universal approximation theorem can be shown with activation functions that are entire (analytic for all values of z) but bounded only almost everywhere, i.e. the elementary transcendental function
ponent outputs y = Wx
(4)
where y is an estimate of the true source vector s, and its components are as independent as possible. For simplicity we assume that the unknown mixing matrix is square, in other words we assume that M = N . Fig. 2. The generalized splitting activation function (GSAF) (ETF). In the case of infomax, since the network is single layer, and the output of the first layer is not used beyond the optimization stage, the problem is simpler. Following these considerations Vitagliano & al. [11] proposed a complex neural network based on bi-dimensional spline AF. Considering the complex function (2) it is possible to express both real and imaginary part as bi-dimensional real functions u(x, y) and v(x, y), which must be bounded and differentiable, with bi-dimensional splines. This AF is known as generalized splitting activation function (GSAF), see figure 2.
3.2. Information Maximization We apply the learning algorithm proposed by Bell & Sejnowski [2], for a N → N network, we try to generalize it using the bi-dimensional spline functions (1) as activation function instead of the logistic sigmoid or the hyperbolic tangent. Maximizing the joint entropy of the output vector y [2,4] (ME approach) we can derive the following learning algorithm ∆W = ηW [(WH )−1 + 2gxH ]
which is the final expression of the generalized algorithm proposed by Bell & Sejnowski, where ηW is the learning rate, a real and positive constant and g is defined as a derivative function of the activation function. The learning rule to adapt the control points of the spline functions is the following:
3. BLIND SIGNAL SEPARATION PROBLEM
∆Qj,iR +mR ,iJ +mJ = 2ηQ ·
(6)
˙ 1j ·Mm )T ˙ 1j ·M·QjR )T T2j ·M·(T T2j ·MmJ ·(T R + 2 ˙ 1j ·M·QjR )T +(T ˙ 2j ·M·(T1j ·M·QjR )T )2 T2j ·M·(T ! T (T˙ 2j ·M·(T1j ·M·QjR )T ) T˙ 2j ·MmJ ·(T1j ·MmR ) 2 ˙ 1j ·M·QjR )T +(T ˙ 2j ·M·(T1j ·M·QjR )T )2 T2j ·M·(T
3.1. Linear and instantaneous model
·
In linear model for complex blind signal separation (BSS) no particular a priori assumptions about the sources are requested: once assumed the source independence [2]. Moreover as in real domain problems, both in complex cases the cumulative density functions (cdf) of the sources are usually unknown. The choice of using a flexible AF is justified by the fact that this can be adapted to the unknown cdf. Suppose we have M observed complex signals x1 (t), ..., xM (t) which are assumed to be the mixtures of N independent complex source signals s1 (t), ..., sN (t). Then s(t) = [s1 (t), ... sN (t)]T and x(t) = [x1 (t), ..., xM (t)]T are called the source vector and the observation vector, respectively. Hence, x(t) = F(s(t)), where F is the unknown mixing system. In the general form, F may be nonlinear or may have memory. The goal of BSS is to provide a separating system G in order to obtain in each component of the output vector y(t) = G(x(t)) ' s(t) an estimation of the original source. If the mixing-separating system is linear and instantaneous (memoryless), the independence of the output insures the separation of the source. In other words, linear instantaneous mixtures are separable. By a linear instantaneous mixture we mean a mixture of the form x = As
(5)
(3)
in which A is a complex mixing matrix. Then a separating matrix W must be estimated to generate independent com-
+
where Mk is a matrix in which all the elements are zero, except thek-th column which is equal to the k-th column of M, ˙ k = 3u2 2uk 1 0 and where ηQ is the learning T k rate. 4. EXPERIMENTAL RESULTS In order to use complex signal for simulations, we thought to use modulated signals. So we used a PSK modulation signal, a uniform random noise (where both the real and imaginary part is a uniform white noise) and a 4-QAM signal, each with 3000 samples and with real and imaginary part artificially correlated (mean correlation coefficient is 0.5). In all the experiments we use the following parameters: the learning rate for the W matrix is ηW = 0.0023, while the learning rate for the spline control points is ηQ = 0.0005. The spline step is ∆ = 0.3 and the algorithm runs for 900 epochs. In order to provide a mathematical evaluation of the output separation, different indexes of performance are available in literature. In this paper the separation index Sj of the jth source was adopted; the index was presented in real environment but in this work it has been extended to a complex environment:
the classical approaches. The quality of the separation has been evaluated in terms of the Separation Index which is widely diffused in literature. 6. REFERENCES [1] T. Adali, T. Kim, and V. D. Calhoun “Independent Component Analysis by Complex Nonlinearities”, in Proc. ICASSP 2004, vol.5, pp. 525-528, Montreal, Canada, 2004. Fig. 3. SIR of the separated signals using a tanh(z) AF
Sj = 10 log E
, X 2 2 |y|σ(j),k |y|σ(j),j E k6=j
(7) In (7) yi,j is the i-th output signal when only the j-th input signal is present, while σ(j) is the output channel corresponding to the j-th input. Figure 3 shows the performance index of separated signals using the tanh(z) function as AF, described in [1,7]. Figure 4 shows the same index for the first 100 epochs in our approach and demonstrates that the training is stable after only 22 epochs and becomes more accurate than the classical approach.
[2] A.J. Bell and T.J. Sejnowski, “An informationmaximization approach to blind separation and blind deconvolution”, Neural Computation, Vol. 7, pp 11291159, 1995. [3] N. Benvenuto, M. Marchesi, F. Piazza and A. Uncini, “Nonlinear satellite radio links equalized using Blind Neural Network”, Proc. of ICASSP, Vol.3, pp.15211524, 1991. [4] V. D. Calhoun and T. Adali, “Complex Infomax: Convergence and Approximation of Infomax With Complex Nonlinearities”, in Proc. NNSP, Switzerland, 2002. [5] V. D. Calhoun, T. Adali, G. D. Pearlson, and J. J. Pekar, “On Complex Infomax Applied to Complex fMRI Data”, in Proc. ICASSP, Orlando, FL, 2002. [6] G. M. Georgiou and C. Koutsougeras, “Complex Domain Backpropagation”, IEEE Trans. On Circuit and System II, Vol. 39, No. 5, pp. 330-334, May 1992. [7] T. Kim and T. Adali, “Universal approximation of fully complex feed-forward neural network”, in Proc. Of IEEE ICASSP’02, Vol. 1, pp. .973-976, Orlando, FL, 2002.
Fig. 4. SIR of the separated signals using GSAF
5. CONCLUSION In this paper a novel complex model of mixing environment has been introduced and described using the generalized splitting function. The BSS problem in this new environment is solved by exploiting an ICA-based algorithm. In particular, the proposed approach extends to complex domain the wellknown ME algorithm; it is based on the use of a couple flexible spline functions to perform local on-line estimation of the AF. The extension to the complex domain is realized by exploiting the attractive properties of splitting functions. Several tests have been performed to verify the effectiveness of the proposed approach and demonstrate that it outperforms
[8] T. Kim and T. Adali, “Fully complex backpropagation for constant envelope signal processing”, in Proc. of IEEE Workshop on Neural Networks for Sig. Proc., pp. 231-240, Sydney, Dec. 2000. [9] M. Solazzi and A. Uncini, “Regularising Neural Networks using Flexible Multivariate Activation Function”, Neural Networks, Vol. 17, pp. 247-260, 2004. [10] A. Uncini and F. Piazza, “Blind Signal Processing by Complex domain Adaptive Spline Neural Network”. In Transaction on Neural Networks, vol.14, no.2, Mar. 2003. [11] F. Vitagliano, R. Parisi and A. Uncini, “Generalized Splitting 2D Flexible Activation Function”, Lecture Notes in Computer Science, Sprinter-Verlag Heidelberg, Vol. 2859/2003, pp. 165-170, May 2003.