M-Estimators Based Activation Functions for Robust ...

10th International Computer Engineering Conference Cairo University, December 29-30, 2014 "Today Information Society What's next?"

M-Estimators Based Activation Functions for Robust Neural Network Learning Mohamed H. Essai,

Ali R. Abd Ellah,

Electrical Engineering Department, Al-Azhar University, Member IEEE Qena - Egypt, [email protected]

Electrical Engineering Department, Al-Azhar University, Qena - Egypt, [email protected]

methods and parameters, or network structure, comparably few works is done towards using activation functions [3].

Abstract - Multi-layer feed-forward neural networks has been proven to be very successful in many applications, as industrial modeling, classification and function approximations. Training data containing outliers are often a problem for these supervised neural networks learning methods that may not always come up with acceptable performance. Robust neural network learning algorithms are often applied to deal with the problem of gross errors and outliers. Recently many researches exploited Mestimators as performance function in order to robustify the NN learning process in the presence of outliers (contaminated data). For first time we propose in our paper to present M-Estimators based activation functions (M-estimators T.Fs) to replace the traditional activation functions (conventional T.Fs).In order to improve the learning process, and hence the robustness of neural networks in presence of outliers. Comparative study between M-estimators T.Fs and conventional T.Fs was established in paper using function approximation problem.

Feed-forward neural networks with one hidden layer using arbitrary activation functions with a bounded range which are often called squashing functions are capable of approximating any function to any desired degree of accuracy, using sufficiently number of hidden neurons [4]. Activation function specifies the output of neuron to given input. The most commonly used activation functions are the logistic sigmoid function and the hyperbolic tangent sigmoid function (sigmoidal activation functions ). A sigmoid function is real-valued and differentiable, having either a nonnegative or nonpositive first derivative which is bell shaped [5]. The term M-estimator denotes a broad class of estimators of maximum likelihood type, which play an important role in robust statistics. Recently many researches exploited Mestimators as performance function in order to robustify the NN learning process [6], [7], [8], [9] in the presence of outliers (contaminated data). M-estimators use some cost functions which increase less than that of least square estimators as the residual departs from zero. When the residual error goes beyond a threshold, the M-estimator suppresses the response instead. Therefore, the M-estimator based performance function is more robust for the presence of the outliers than LMS based performance function. M-estimators replace the MSE performance function and then provide the robustness for the traditional neural networks learning algorithms.

Keywords — Activation function, M-estimators, function approximation, Robust Statistics, Back-Propagation.

I.

INTRODUCTION

The ANN is a biologically inspired computational model composed of various processing elements called artificial neurons. They are connected with coefficients or weights which constructs the neural network’s structure [1]. There are many types of neural networks with different structures that have been designed, but all are described by the transfer functions that are used in neurons, the way of training given or learning rule and by the connection formula. The neurons have weighted inputs, transfer function and outputs for processing information. An ANN is composed of a single-layer or multiple layer neurons. For complex problems multilayer perceptron (MLP) is the best model as it overcomes the drawbacks of the single-layer perceptron by adding more hidden layers.

In our work, and this for the best of our knowledge, we will exploit M-estimators as activation functions, and hence investigate and study its use on the robustness of the NN learning process in the presence of outliers. Outliers are sample values that are dramatically different from patterns in the rest of the data, and cause surprise in relation to the majority of samples. Outliers are a common feature in many real data sets, their occurrence in raw data ranges from 1 to 10% [7], [9]. They may be due to measurement error, or they may represent significant features in the data. Identifying outliers, and deciding what to do with them, depend on an understanding of the data and its source.

In a feed-forward multilayer neural network the inputs signals multiplied by the connection weights are first summed and then directed to a transfer function to give output for that neuron. The transfer function processes the weighted sum of the neuron’s inputs [2]. The most serious property of ANNs is the possibility to adapt their behavior to the changing characteristics of the modeled system. Recently, many researchers have investigated a variety of methods to robustify the ANN performance in the presence of outliers (noisy data) by optimizing training

The objective of our contribution is to introduce Mestimators for first time as activation function in order to robustify the neural network learning in the presence of outliers. 76

The outline of this paper is as follows. Section (2) presents Activation function in neural networks. Section (3) presents M-estimators and shows some M-estimators T.F. Section (4) discusses Outliers Analysis and Noisy data. Section (5) discusses ANN Learning Algorithms. Section (6) gives our experimental results. Section (7) conclusion. II.



ρ(x) = ρ(-x) for all x



ρ(x) increases as x increases from 0, but doesn’t get too large as x increases TABLE. 2, M-estimators T.F and their derivatives

ACTIVATION FUNCTIONS

Applied to the weighted sum of the inputs of a neuron to produce the output. The choice of activation functions may strongly influence complexity and performance of neural networks and have been said to play an important role in the convergence of the learning algorithms. The most commonly used sigmoidal functions satisfy the requirements of the universal approximation theorem [10].

Logsig

1 1 + e−𝑥

𝑓(𝑥) (1 − 𝑓(𝑥))

Tansig

2 −1 1 + e−2𝑥

(1 − 𝑓 2 (𝑥))

III.

IV.

𝑐2 [

|𝑥| |𝑥| − log (1 + )] 𝑐 𝑐

𝑥 1 + |𝑥|⁄𝑐

𝑥 2⁄2 𝑘(|𝑥| − 𝑘 ⁄2)

𝑥 {𝑘 Sgn(𝑥)

Cauchytf

𝑐2 log(1 + (𝑥 ⁄𝑐)2 ) 2

1 + (𝑥 ⁄𝑐)2

GMtf

𝑥 2⁄2 1 + 𝑥2

𝑥 (1 + 𝑥 2 )2

Lmlstf

1 log(1 + 𝑥 2 ) 2

𝑥 1 1 + 𝑥2 2

𝑖𝑓 |𝑥| ≤ 𝑘 { 𝑖𝑓 |𝑥| ≥ 𝑘

{

𝑥

OUTLIERS ANALYSIS AND NOISY DATA

Outliers may be detected using statistical tests that assume a distribution or probability model for the data, or using distance measures where objects that are a substantial distance from any other cluster are considered outliers. Rather than using statistical or distance measures, deviation-based methods identify outliers by examining differences in the main characteristics of objects in a group [13]. Noisy data is meaningless data. The term has often been used as a synonym for corrupt data. However, its meaning has expanded to include any data that cannot be understood and interpreted correctly by machines, such as unstructured text. Any data that has been received, stored, or changed in such a manner that it cannot be read or used by the program that originally created it can be described as noisy.

M-ESTIMATORS

M-estimators are a class of estimators that belongs to robust statistics [12] family generated from the maximum likelihood estimators that are designed to be stable under minor noise perturbation and robust against gross errors in the data. We shall use it in order to robustify the NN learning process [6], [7], [9] in the presence of contaminated data (outliers). The traditional activation functions will be replaced by Mestimators based activation functions (M-estimators T.F) in order to improve the learning process and hence the robustness of neural networks in presence of outliers. Table. 2, shows M-estimators T.F ρ(x) and their derivatives ψ(x).

Noisy data unnecessarily increases the amount of storage space required and can also adversely affect the results of any data mining analysis. Statistical analysis can use information gleaned from historical data to weed out noisy data and facilitate data mining. Noisy data can be caused by hardware failures, programming errors and gibberish input from speech or optical character recognition (OCR) programs [14]. V.

ANN LEARNING ALGORITHMS

Back-Propagation [10] method is a technique used in training multilayer neural networks in a supervised manner. The back propagation method, also known as the error back

Where ρ is some function with the following properties: 

Sgn(x)

A data base may contain data objects that do not comply with the general behavior or model of the data. These data objects are outliers. Most data mining methods discard outliers as noise or exceptions. However, in some applications such as fraud detection, the rare events can be more interesting than the more regularly occurring ones. The analysis of outlier data is referred to as outlier mining.

TABLE. 1 conventional T.F and their derivatives 𝑑𝑓(𝑥)⁄𝑑𝑥

ψ(x)

|𝑥|

Hubertf

In general, a sigmoid function is A fairly simple non-linear function, real-valued and differentiable, having either a nonnegative or nonpositive first derivative which is bell shaped. There are also a pair of horizontal asymptotes, this function is especially advantageous for use in neural networks trained by back-propagation, because it is easy to differentiate, and has an easily calculated derivative, which can be important when calculating the weight updates in the network. It thus makes the network more easily manipulable mathematically and thus can dramatically reduce the computation burden for training. Table. 1 some commonly used conventional T.F and their derivatives.

𝑓(𝑥)

ρ(x)

L1tf Fairtf

Sigmoid functions characteristics are, Smooth, continuous, and monotonically increasing (derivative is always positive), bounded range - but never reaches max or min, consider “ON” to be slightly less than the max and “OFF” to be slightly greater than the min [11].

Type

Type

ρ(x) ≥ 0 for all x and has a minimum at 0

77

propagation algorithm, is based on the error-correction learning rule; it is most often used learning algorithm in neural networks as it deals with continuous data and differentiable functions for both single and multilayer models. Usually, back propagation learning algorithm is used to update the network weights during training in order to improve the network performance [10].

Y = 𝑓(x) + e The output noise deviation (e) is a random vector due to the imprecise measurements made by physical devices in real world environments.

Three training algorithms are presented:

The function to be approximated can be expressed as follow:







The function approximation task can be summarized as to find an estimator 𝑓̂of 𝑓 such that some metric of approximation error is minimized [16].

Trainlm: is a network training function that updates weight and bias values according to LevenbergMarquardt optimization. Trainlm is often the fastest backpropagation algorithm in the toolbox, and is highly recommended as a first-choice supervised algorithm, although it does require more memory than other algorithms. Trainlm algorithm appears to be the fastest method for training moderate-sized feedforward neural networks (up to several hundred weights). It has memory reduction feature for use when the training set is large.

y = |𝑥|2⁄3 This function is proposed in [6], [7], [8], [9].

The neural network architecture considered is a two layer feed-forward with ten hidden neurons. A total of 501 training patterns were generated by sampling the independent variable in the range [-2, 2], and using (1) to calculate the dependent variable. To compare the performances of all above mentioned activation functions, we use root mean square error (RMSE) of each model,

Traincgf: is a network training function that updates weight and bias values according to the conjugate gradient backpropagation with Fletcher-Reeves updates. The conjugate gradient algorithms are usually much faster than variable learning rate backpropagation, and are sometimes faster than trainrp (resilient backpropagation), although the results will vary from one problem to another. The conjugate gradient algorithms require only a little more storage than the simpler algorithms, so they are often a good choice for networks with a large number of weights .

2 ∑𝑁 𝑖=1(𝑡𝑖 −𝑦𝑖 )

RMSE = √

𝑁

(2)

Where the target 𝑡𝑖 is the actual value of the function at 𝑥𝑖 and 𝑦𝑖 is the output of the network given 𝑥𝑖 as its input. We shall study the performance of neural networks in four cases according to outliers percent: Clean data, Neural network trained with high quality clean data (noise free) Set A, Neural networks trained with high-quality data corrupted with small Gaussian noise: G2~N (0,0.1).

Traincgp: is a network training function that updates weight and bias values according to the conjugate gradient backpropagation with Polak-Ribiere updates. The traincgp routine has performance similar to traincgf. It is difficult to predict which algorithm will perform best on a given problem. The storage requirements for Polak-Ribiere (four vectors) are slightly larger than for Fletcher-Reeves (three vectors) [15]. VI.

) 1)

Set B, Neural networks trained data corrupted with Gaussian noise, G2 , in addition to high value random outliers of the form: H1~ N (-15,2), H2~N (-20,3), H3 ~ N (+30,1.5), H4~ N (12,3). The data perturbation used in this case is as follows:

SIMULATION RESULTS

In this section, the performance of feed-forward neural networks (FFNN), which are constructed using M-estimators based activation functions, and those using traditional (tansig and logsig) activation functions, all trained using trainlm, traincgf, and traincgp learning algorithms, will be evaluated for the function approximation application.

Data = (1 − 𝜀%)𝐺2 + 𝜀%(𝐻1 + 𝐻2 + 𝐻3 + 𝐻4 ) The outliers were introduced in the data with percentage: Set B. With 𝜀%=0.1. Set C, Neural networks trained with 49% of the data corrupted with Gaussian noise G2 ~ N (0,0.1); and the remaining 51% of the data substituted by background noise, uniformly distributed.

Numerous engineering problems in signal processing, computer vision, and pattern recognition can be abstracted into the task of approximating an unknown function from a training set of input-output pairs, It is hypothesized that the input vector and the output vector are related by an unknown function 𝑓 such that

78

Case 1: Neural networks trained with trainlm algorithm, "MSE" as a performance function and two layers NN with 10,15, 20 and 25 hidden neurons, the results are shown in Table. 3 - Table 6.

Table. 6 shows that, In condition of using training algorithm "Trainlm", "MSE" as a performance function, and two layers NN with 25 – hidden neurons, generally M-Estimator based transfer functions outperform "Logsig" (among set of examined T.F.) conventional T. F.

TABLE. 3 proper performance of neural networks using Trainlm algorithm with H.N 10

Set NO Clean data Set A Set B Set C

Trainlm - H.N 10 M-Estimator Conventional RMSE T. F T. F GMtf 0.0021 Tansig GMtf 0.0235 Tansig GMtf 0.2028 Tansig GMtf 0.2845 Logsig

Case 2: Neural networks trained with traincgf algorithm, "MSE" as a performance function and two layers NN with 10,15, 20 and 25 hidden neurons, the results are shown Table. 7 - Table. 10.

RMSE 0.0024 0.0287 0.2840 1.3219

TABLE. 7 proper performance of neural networks using Traincgf algorithm with H.N 10

Table. 3 shows that, in the condition of using training algorithm "Trainlm", "MSE" as a performance function, and two layers NN with 10 – hidden neurons, "GMtf" - MEstimator based transfer function (among set of proposed T.F.) outperforms "Tansig & Logsig" conventional T. F.


TABLE. 4 proper performance of neural networks using Trainlm algorithm with H.N 15 Trainlm - H.N 15 M-Estimator Conventional RMSE T. F. T. F GMtf 0.00091532 Logsig L1tf 0.0236 Tansig Cauchytf 0.2305 Logsig GMtf 0.5639 Logsig


RMSE 0.0021 0.0287 0.2756 0.6369

Set NO


RMSE 0.0109 0.0342 0.2187 1.1270

Table. 8 shows that, In condition of using training algorithm " Traincgf", "MSE" as a performance function, and two layers FFNN with 15 – hidden neurons, generally M-Estimator based transfer functions outperform "Logsig" (among set of examined T.F.) conventional T. F.

0.0028 0.0280 0.2610 1.3231




Set NO

Traincgf- H.N 15 M-Estimator Conventional RMSE T. F T. F Lmlstf 0.0093 Logsig L1tf 0.0261 Logsig GMtf 0.2139 Logsig Lmlstf 0.3772 Logsig

RMSE

Table. 5 shows that, In condition of using training algorithm "Trainlm", "MSE" as a performance function, and two layers NN with 20 – hidden neurons, generally M-Estimator based transfer functions outperform "Tansig& Logsig" conventional T. F.

Clean data Set A Set B Set C

0.009 0.0308 0.2194 1.0001




RMSE

Table. 7 shows that, In condition of using training algorithm " Traincgf ", "MSE" as a performance function, and two layers FFNN with 10 – hidden neurons, generally M-Estimator based transfer functions outperform "Logsig" (among set of examined T.F.) conventional T. F., except at the case of clean training data, where "Logsig" outperforms all of our proposed T.Fs.

Table. 4 shows that, In condition of using training algorithm "Trainlm", "MSE" as a performance function, and two layers NN with 15 – hidden neurons, generally M-Estimator based transfer functions outperform "Tansig& Logsig" conventional T. F.

Trainlm - H.N 20 M-Estimator Conventional RMSE T. F T. F Hubertf- L1tf- GMtf 0.0027 Logsig - Tansig Lmlstf 0.0237 Tansig GMtf 0.2409 Logsig Cauchytf 0.5254 Logsig

Traincgf - H.N 10 M-Estimator Conventional RMSE T. F T. F Hubertf 0.0102 Logsig Hubertf 0.0274 Logsig Lmlstf 0.2066 Logsig Cauchytf 0.3192 Logsig

Trainlm - H.N 25 M-Estimator Conventional RMSE RMSE T. F T. F GMtf 0.0014 Logsig 0.0023 L1tf 0.0245 Logsig 0.0313 GMtf 0.1988 Logsig 0.2005 Cauchytf 0.5695 Logsig 0.9107

Traincgf - H.N 20 M-Estimator Conventional RMSE T. F T. F Hubertf- Lmlstf 0.0082 Logsig L1tf 0.0272 Logsig Lmlstf 0.1327 Tansig Lmlstf 0.7191 Logsig

RMSE 0.0087 0.0346 0.2954 1.1650

Table. 9 shows that, In condition of using training algorithm " Traincgf", "MSE" as a performance function, and two layers FFNN with 20 – hidden neurons, generally M-Estimator

79

based transfer functions outperform conventional T. F.

Table. 13 proper performance of neural networks using Traincgp algorithm with H.N 20

"Logsig & Tansig"



Traincgf - H.N 25 M-Estimator Conventional RMSE T. F T. F L1tf 0.0098 Logsig L1tf 0.0276 Logsig GMtf 0.2345 Logsig Lmlstf 0.6002 Logsig


RMSE 0.0108 0.0401 0.2749 1.1969




RMSE 0.0101 0.0311 0.2675 1.0215

VII.


RMSE 0.0101 0.0410 0.3407 1.1160

CONCLUSION

In this paper we introduced a family of robust statics Mestimators as activation functions (M-Estimator T. Fs). It is well known that this family provided high reliability for robust NN training in the presence of contaminated data. we compared between these functions and some traditional activation functions existing in the literature (Conventional T. Fs). Simulation results show that the proposed M-Estimator T. Fs outperform the Conventional T. Fs in all examined cases.


Set NO

Traincgp- H.N 25 M-Estimator Conventional RMSE T. F T. F Hubertf 0.0087 Logsig Hubertf 0.0258 Logsig Cauchytf 0.1622 Logsig GMtf 1.0175 Tansig

Table. 14 shows that, In condition of using training algorithm " Traincgp ", "MSE" as a performance function, and two layers FFNN with 25 – hidden neurons, generally MEstimator based transfer functions outperform "Logsig & Tansig" (among set of examined T.F.) conventional T. F.

Table. 11 shows that, In condition of using training algorithm " Traincgp ", "MSE" as a performance function, and two layers FFNN with 10 – hidden neurons, generally MEstimator based transfer functions outperform "Logsig & Tansig" (among set of examined T.F.) conventional T. F.

Traincgp - H.N 15 M-Estimator Conventional RMSE T. F T. F Lmlstf 0.0094 Logsig L1tf 0.0277 Logsig GMtf 0.2012 Logsig Lmlstf 0.6851 Logsig

0.0098 0.0386 0.2916 1.2283

TABLE. 14 proper performance of neural networks using Traincgp algorithm with H.N 25

Case 3: Neural networks trained with traincgp algorithm, "MSE" as a performance function and two layers NN with 10,15, 20 and 25 hidden neurons, the results are shown Table. 11 - Table. 14.

Traincgp- H.N 10 M-Estimator Conventional RMSE T. F T. F Hubertf 0.0090 Logsig Hubertf 0.0299 Logsig GMtf 0.1562 Logsig Hubertf 0.4765 Tansig

RMSE

Table. 13 shows that, In condition of using training algorithm " Traincgp ", "MSE" as a performance function, and two layers FFNN with 20 – hidden neurons, generally MEstimator transfer functions outperform "Logsig" (among set of examined T.F.) conventional T. F.

Table. 10 shows that, In condition of using training algorithm " Traincgf", "MSE" as a performance function, and two layers FFNN with 25 – hidden neurons, generally M-Estimator based transfer functions outperform "Logsig" (among set of examined T.F.) conventional T. F.

Set NO

Traincgp - H.N 20 M-Estimator Conventional RMSE T. F T. F L1tf 0.0095 Logsig L1tf 0.0279 Logsig GMtf 0.2125 Logsig Lmlstf 0.6071 Logsig

REFERENCES RMSE

[1]

0.0101 0.0357 0.2863 1.0654

[2]

[3]

Table. 12 shows that, In condition of using training algorithm " Traincgp ", "MSE" as a performance function, and two layers FFNN with 15 – hidden neurons, generally MEstimator based transfer functions outperform "Logsig" (among set of examined T.F.) conventional T. F.

[4]

[5] [6]

80

S. Haykin, Neural Networks- A Comprehensive Foundation (2nd ed., Pearson Prentice Hall, 2005). Bhavna Sharma and Prof. K. Venugopalan, "Comparison of Neural Network Training Functions for Hematoma Classification in Brain CT Images" IOSR Journal of Computer Engineering (IOSR-JCE) Volume 16, Issue 1, Ver. II, PP. 31-35, (Jan. 2014) Bekir Karlik and A. Vehbi Olgac, "Performance Analysis of Various Activation Functions in Generalized MLP Architectures of Neural Networks" International Journal of Artificial Intelligence And Expert Systems (IJAE), Volume (1): Issue (4), pp. 111–122, 2010 Hornik, K., M. Stinchcombe, and H. White. "Multi-layer feedforward networks are universal approximators, Neural Networks", vol. 2, pp. 359–366, (1989). Kishan Mehrotra, Chilukuri K. Mohan and Sanjay Ranka, " Elements of Artificial Neural Networks" 1996 Mohamed M. Zahra, Mohamed H. Essai, Ali R. Abd Ellah, "Performance Functions Alternatives of Mse for Neural Networks

[7]

[8]

[9]

[10]

[11] [12] [13] [14]

[15] [16]

Learning", International Journal of Engineering Research & Technology (IJERT), Vol. 3, Issue 1, pp. 967-970, January - 2014 Mohamed M. Zahra, Mohamed H. Essai, Ali R. Abd Ellah, "Robust Neural Network Classifier" International Journal of Engineering Development and Research (IJEDR), | ISSN: 2321-9939 pp. 326-331, January - 2014 M.T El-Melegy, M. Essai, "From Robust Statistics to Artificial Intelligence: M- estimators for Training Feed -Forward Neural Networks" Al-Azhar Engineering Ninth International Conference (AEIC), Vol. 2, No. 5, pp. 85-100, Apr. 2007 M. El-Melegy, M. Essai, and A. Ali, “Robust training of artificial feedforward neural networks,” Springer, vol. 1, pp. 217–242, Jun. 2009. Saduf, Mohd Arif Wani, "Comparative Study of Back Propagation Learning Algorithms for Neural Networks", International Journal of Advanced Research in Computer Science and Software Engineering ( ijarcsse ), Volume 3, Issue 12, pp. 1151-1156, December 2013 Vincent Cheung and Kevin Cannons, “ An Introduction of Neural Networks”, Manitoba, Canada, May 27, 2002 P. J. Huber. Robust Statistics. John Wiley & Sons, New York, 1981 Jiawei Han and Micheline Kamber "Data Mining: Concepts and Techniques" Second Edition 2006 Chirag Patel, Atul Patel, Dharmendra Patel , " Optical Character Recognition by Open Source OCR Tool Tesseract: A Case Study" International Journal Computer Applications , Volume 55– No.10, pp. 50-56, October-20 Howard Demuth, Mark Beale, Martin Hagan, "Neural Network Toolbox™ 7 User’s Guide" Sangit, Chatterjee Matthew Laudato, "Statistical Applications of Neural Networks" 1995

Ali Refaee Abd Ellah Mohamed, received his BSc degree in electronics & communication engineering from AlAzhar University , Qena, Egypt in 2009, He is currently working toward MSc degree in electrical engineering from AlAzhar University, Cairo, Egypt. He is currently a teaching assistant in Electrical Engineering Department, electronics & communication branch, faculty of engineering, Al-Azhar University, Qena, Egypt. His current research interests are the robustness of artificial neural networks learning algorithms, and ANN applications. He has 2 publications in international journals. E-mail:

Mohamed Hassan Essai Ali, received his BSc degree in electronics & communication engineering from Al-Azhar University, Qena, Egypt in 2001, MSc degree in electrical engineering from the school of engineering, Assiut University, Assiut, Egypt, 2007, and PhD degree in technical science – radio engineering, including systems and devices of television from Novosibirsk State Technical University, Novosibirsk, Russian Federation, in Mart 2012. From Oct. 2012 to Mart 2014 he was an assistant professor at electrical engineering department, electronics & communication branch, faculty of engineering, Al-Azhar University, Qena, Egypt, during 1st April 2014 to 25 Dec. 2014, he is informal post-doctoral mission, at Novosibirsk State Technical University, Novosibirsk, Russian Federation. His research interests include theory and applications of robust statistics, multi-user detection, channel estimation of signals in terms of a priori uncertainty for the problems of telecommunications, CDMA and spread-spectrum systems, OFDM systems, MIMO, systems and applications of neural networks. He has 21 publications in russian and english languages. E-mail: [email protected], [email protected]

81

[email protected],

[email protected]

82

M-Estimators Based Activation Functions for Robust ...

M-Estimators Based Activation Functions for Robust ...

Suggest Documents

Hermitian-Based Hidden Activation Functions for ... - Microsoft

Searching for Activation Functions - arXiv

Hermitian-Based Hidden Activation Functions for Adaptation of Hybrid ...

Kafnets: kernel-based non-parametric activation functions for neural ...

Noisy Activation Functions - arXiv

Robust Hash Functions for Digital Watermarking

Adaptive Neural Activation Functions in

Robust Estimation for Radial Basis Functions Adrian G ... - CiteSeerX

An Evaluation of Robust Cost Functions for RGB Direct Mapping

Robust activation of microhomology-mediated end joining for ... - Plos

Approximated Computation of Belief Functions for Robust Design ...

Robust Hash Functions for Digital Watermarking - Semantic Scholar

An Evaluation of Robust Cost Functions for RGB Direct Mapping

Deep Neural Networks with Multistate Activation Functions

Multiple Functions and Dynamic Activation of MPK

A robust ant colony optimization for continuous functions

Parameterised Sigmoid and ReLU Hidden Activation Functions for ...

Stochastic Neural Networks with Monotonic Activation Functions

Development and Activation Overlapping Functions ...

Multilevel Activation Functions For True Color Image ... - CiteSeerX

a Comparison of Activation Functions - Semantic Scholar

Nonmonotonic Activation Functions in Multilayer ... - CiteSeerX

Lymphocyte activation and effector functions

Nonmonotonic Activation Functions in Multilayer ... - CiteSeerX