Extreme Learning Machine for Multi-Categories ... - CiteSeerX

Outline

Extreme Learning Machine for MultiCategories Classification Applications Hai-Jun Rong1,2 , Guang-Bin Huang1 and Yew-Soon Ong2 1 School

of Electrical and Electronic Engineering

2 School

of Computer Engineering Nanyang Technological University Nanyang Avenue, Singapore 639798 E-mail: {hjrong, egbhuang, asysong}@ntu.edu.sg

IEEE World Congress on Computational Intelligence Hong Kong, June 1-6 2008 ELM Web Portal: www.ntu.edu.sg/home/egbhuang

Extreme Learning Machines

tu-logo

ur-logo

Outline

Outline 1

Neural Networks Single-Hidden Layer Feedforward Networks (SLFNs) Conventional Learning Algorithms of SLFNs

2

Extreme Learning Machine Unified Learning Platform ELM Algorithm

3

ELM for Multi-Categories Classification Problems

4

Performance Evaluations

5

Summary

ELM Web Portal: www.ntu.edu.sg/home/egbhuang

tu-logo

ur-logo


Outline

Outline 1


2


3


4


5

Summary


tu-logo

ur-logo


Outline

Outline 1


2


3


4


5

Summary


tu-logo

ur-logo


Outline

Outline 1


2


3


4


5

Summary


tu-logo

ur-logo


Outline

Outline 1


2


3


4


5

Summary


tu-logo

ur-logo


Neural Networks ELM ELM for Multi-Categories Classification Problems Performance Evaluations Summary

SLFN Models Learning Methods

Outline 1


2


3


4


5

Summary


tu-logo

ur-logo




Feedforward Neural Networks with Additive Nodes

Output of hidden nodes G(ai , bi , x) = g(ai · x + bi )

(1)

ai : the weight vector connecting the ith hidden node and the input nodes. bi : the threshold of the ith hidden node.

Output of SLFNs

fL (x) =

L X

βi G(ai , bi , x)

(2)

i=1

Figure 1:

Feedforward Network Architecture: additive hidden

tu-logo βi : the weight vector connecting the ith hidden node and the output nodes.

nodes ur-logo







(1)


Output of SLFNs

fL (x) =

L X

βi G(ai , bi , x)

(2)

i=1

Figure 1:



nodes ur-logo







(1)


Output of SLFNs

fL (x) =

L X

βi G(ai , bi , x)

(2)

i=1

Figure 1:



nodes ur-logo





Feedforward Neural Networks with RBF Nodes

Output of hidden nodes G(ai , bi , x) = g (bi kx − ai k)

(3)

ai : the center of the ith hidden node. bi : the impact factor of the ith hidden node.

Output of SLFNs

fL (x) =

L X

βi G(ai , bi , x)

(4)

i=1

Figure 2:

Feedforward Network Architecture: RBF hidden

nodes

tu-logo βi : the weight vector connecting the ith hidden node and the output nodes. ur-logo







(3)


Output of SLFNs

fL (x) =

L X

βi G(ai , bi , x)

(4)

i=1

Figure 2:


nodes








(3)


Output of SLFNs

fL (x) =

L X

βi G(ai , bi , x)

(4)

i=1

Figure 2:


nodes






Outline 1


2


3


4


5

Summary


tu-logo

ur-logo




Function Approximation of Neural Networks

Learning Model For N arbitrary distinct samples (xi , ti ) ∈ Rn × Rm , SLFNs with L hidden nodes and activation function g(x) are mathematically modeled as fL (xj ) = oj , j = 1, · · · , N

Cost function: E =

Figure 3:

PN

j=1

(5)

‚ ‚ ‚ ‚ ‚oj − tj ‚ .

The target is to minimize the cost function E by adjusting the network parameters: βi , ai , bi .

2

tu-logo

Feedforward Network Architecture. ur-logo







Cost function: E =

Figure 3:

PN

j=1

(5)

‚ ‚ ‚ ‚ ‚oj − tj ‚ .


2

tu-logo








Cost function: E =

Figure 3:

PN

j=1

(5)

‚ ‚ ‚ ‚ ‚oj − tj ‚ .


2

tu-logo








Cost function: E =

Figure 3:

PN

j=1

(5)

‚ ‚ ‚ ‚ ‚oj − tj ‚ .


2

tu-logo






Learning Algorithms of Neural Networks

Learning Methods Many learning methods mainly based on gradient-descent/iterative approaches have been developed over the past two decades. Back-Propagation (BP) and its variants are most popular.

Figure 4:

tu-logo








Figure 4:

tu-logo








Figure 4:

tu-logo






Advantagnes and Disadvantages Popularity Widely used in various applications: regression, classification, etc. Limitations Usually different learning algorithms used in different SLFNs architectures. Some parameters have to be tuned mannually. tu-logo

Overfitting. Local minima.

ur-logo

Time-consuming. ELM Web Portal: www.ntu.edu.sg/home/egbhuang




Advantagnes and Disadvantages Popularity Widely used in various applications: regression, classification, etc. Limitations Usually different learning algorithms used in different SLFNs architectures. Some parameters have to be tuned mannually. tu-logo

Overfitting. Local minima.

ur-logo

Time-consuming. ELM Web Portal: www.ntu.edu.sg/home/egbhuang



Unified Learning Platform ELM Algorithm

Outline 1


2


3


4


5

Summary


tu-logo

ur-logo




Extreme Learning Machine (ELM) New Learning Theory If continuous target function f (x) can be approximated by SLFNs with adjustable hidden nodes then the hidden node parameters of such SLFNs needn’t be tuned. Instead, all these hidden node parameters can be randomly generated without the knowledge of the training data. Given any nonconstant piecewise continuous function g, for any continuous target function f and any randomly generated sequence {(ai , bi )Li=1 }, lim kf (x) − fL (x)k = 0

L→∞

Figure 5:

Feedforward Network Architecture: any type of

G(ai , bi , x).

holds with probability one if βi is chosen to tu-logo minimize the kf (x) − fL (x)k, i = 1, · · · , L.

G.-B. Huang, et al., “Universal Approximation Using Incremental Constructive Feedforward Networks with Random ur-logo Hidden Nodes,” IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879-892, 2006. G.-B. Huang, et al., “Convex Incremental Learning Machine,” Neurocomputing, vol. 70, pp. 3056-3062, 2007. ELM Web Portal: www.ntu.edu.sg/home/egbhuang




Extreme Learning Machine (ELM) New Learning Theory If continuous target function f (x) can be approximated by SLFNs with adjustable hidden nodes then the hidden node parameters of such SLFNs needn’t be tuned. Instead, all these hidden node parameters can be randomly generated without the knowledge of the training data. Given any nonconstant piecewise continuous function g, for any continuous target function f and any randomly generated sequence {(ai , bi )Li=1 }, lim kf (x) − fL (x)k = 0

L→∞

Figure 5:


G(ai , bi , x).

holds with probability one if βi is chosen to tu-logo minimize the kf (x) − fL (x)k, i = 1, · · · , L.

G.-B. Huang, et al., “Universal Approximation Using Incremental Constructive Feedforward Networks with Random ur-logo Hidden Nodes,” IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879-892, 2006. G.-B. Huang, et al., “Convex Incremental Learning Machine,” Neurocomputing, vol. 70, pp. 3056-3062, 2007. ELM Web Portal: www.ntu.edu.sg/home/egbhuang




Unified Learning Platform

Mathematical Model For N arbitrary distinct samples (xi , ti ) ∈ Rn × Rm , standard SLFNs with L hidden nodes and output function g(x) are mathematically modeled as L X

βi G(ai , bi , xj ) = tj , j = 1, · · · , N

i=1

(6) (ai , bi ): Hidden node parameters.

Figure 6:


tu-logo βi : the weight vector connecting the ith hidden node and the output node.

G(ai , bi , x). ur-logo





Unified Learning Platform

Mathematical Model For N arbitrary distinct samples (xi , ti ) ∈ Rn × Rm , standard SLFNs with L hidden nodes and output function g(x) are mathematically modeled as L X

βi G(ai , bi , xj ) = tj , j = 1, · · · , N

i=1

(6) (ai , bi ): Hidden node parameters.

Figure 6:


tu-logo βi : the weight vector connecting the ith hidden node and the output node.

G(ai , bi , x). ur-logo





Extreme Learning Machine (ELM) Mathematical Model PL

i=1

βi G(ai , bi , xj ) = tj , j = 1, · · · , N is equivalent to Hβ = T, where H(a1 , · · · , aL , b1 , · · · , bL , x1 , · · · , xN ) 3 ··· G(aL , bL , x1 ) 7 . 7 . 5 ··· .

G(a1 , b1 , x1 ) 6 . =6 . 4 . G(a1 , b1 , xN ) 2

2 6 6 β =6 4

β1T . . . βL T

···

G(aL , bL , xN )

3

2

7 7 7 5

6 6 and T = 6 4

L×m

(7)

N×L

3 tT 1 7 . 7 . 7 . 5 tT N N×m

(8)

H is called the hidden layer output matrix of the neural network; the ith column of H is the output of the ith hidden node with respect to inputs x1 , x2 , · · · , xN .



tu-logo

ur-logo



Outline 1


2


3


4


5

Summary


tu-logo

ur-logo




Extreme Learning Machine (ELM) Three-Step Learning Model Given a training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, · · · , N}, activation function g, and the number of hidden nodes L, 1

Assign randomly hidden node parameters (ai , bi ), i = 1, · · · , L.

2

Calculate the hidden layer output matrix H.

3

Calculate the output weight β: β = H† T.

where H† is the Moore-Penrose generalized inverse of hidden layer output matrix H. tu-logo

Source Codes of ELM http://www.ntu.edu.sg/home/egbhuang/


ur-logo






2


3





ur-logo






2


3





ur-logo






2


3





ur-logo






2


3





ur-logo




ELM Learning Algorithm

Salient Features “Simple Math is Enough.” ELM is a simple tuning-free three-step algorithm. The learning speed of ELM is extremely fast. Unlike the conventional learning methods which MUST see the training data before generating the hidden node parameters, ELM could generate the hidden node parameters before seeing the training data. Unlike the traditional classic gradient-based learning algorithms which only work for differentiable activation functions, ELM works for all bounded nonconstant piecewise continuous activation functions including non-differential activation functions. Unlike the traditional classic gradient-based learning algorithms facing several issues like local minima, improper learning rate and overfitting, etc, ELM tends to reach the solutions straightforward without such trivial issues. The ELM learning algorithm looks much simpler than many learning algorithms: neural networks and support vector machines.

tu-logo

ur-logo




ELM for Multi-Categories Classification Problems Three basic methods 1

Single ELM classifier: m output nodes of ELM for m-class applications. We say x is in class l if output node l has the highest output value.

2

One-Against-All ELM (ELM-OAA): m-class classification problems are implemented by m binary ELM classifiers, each of which is trained independently to classify one of the m pattern classes.

3

One-Against-One ELM (ELM-OAO): the m pattern classes are pairwise decomposed into m(m − 1)/2 two different classes, and each of them is trained by one binary ELM classifier. tu-logo

Exponential loss based decoding approach used in ELM-OAA and ELM-OAO E. L. Allwein, et al., “Reducing multiclass to binary: a unifying approach for margin classifiers,” Journal of Machine ur-logo

Learning Research, vol. 1, pp. 113-141, 2001.






2


3









2


3









2


3









2


3







Table 1: Specification of Real-World Classification Benchmark Problems Type

Datasets

Type I

Glass Vehicle Page Blocks Image Segmentation Satellite Image Shuttle DNA Optical Recognition of Handwritten Digits Pen-Based Recognition of Handwritten Digits Cancer Arrhythmia Letter Recognition

Type II

# Attributes

# Classes

9 18 10 19 36 9 180 64 16 98 279 16

6 4 5 7 6 7 3 10 10 14 16 26

# Observations Training Testing 110 104 420 426 2,700 2,773 1,100 1,210 4,400 2,035 43,500 14,500 1,000 1,186 3,823 1,797 7,494 3,498 144 46 300 152 10,000 10,000

tu-logo

ur-logo




Table 2: Comparison of network complexity of ELM-OAA, ELM and ELM-OAO Datasets Glass Vehicle Page Blocks Image Segmentation Satellite Image Shuttle DNA Handwritten Optical Recognition Handwritten Pen-Based Recognition Cancer Arrhythmia Letter Recognition

ELM-OAA 30 90 130 200 450 260 350 420

ELM 30 110 160 210 470 340 450 470

ELM-OAO 20 50 50 30 150 80 290 110

650

690

190

40 50 1100

50 70 1500

10 20 510

tu-logo

ur-logo




Table 3: Comparison of testing accuracy of ELM-OAA, ELM and ELM-OAO Datasets Glass Vehicle Page Blocks Image Segmentation Satellite Image Shuttle DNA Handwritten Optical Recognition Handwritten Pen-Based Recognition Cancer Arrhythmia Letter Recognition

ELM-OAA 66.346 79.513 95.873 95.216 89.670 99.667 94.866 96.928

ELM 64.423 79.489 95.761 94.667 89.663 99.715 94.828 96.948

ELM-OAO 65.385 81.378 95.397 94.493 89.955 99.581 94.496 96.343

98.294

98.302

97.819

78.652 65.441 93.417

77.522 65.395 93.029

73.000 61.770 93.055

tu-logo

ur-logo




Table 4: Comparison of training and testing time (seconds) of ELM-OAA, ELM and ELM-OAO Datasets Glass Vehicle Page Blocks Image Segmentation Satellite Image Shuttle DNA Handwritten Optical Recognition Handwritten Pen-Based Recognition Cancer Arrhythmia Letter Recognition

ELM-OAA (s) Training Testing 0.0250 0.0617 0.1508 0.0617 1.7125 0.5023 3.3547 0.2953 43.838 1.6500 151.88 7.4594 7.0797 0.6227 39.002 2.0734 140.17 5.3305 0.0491 0.0191 0.1250 0.0586 2295.6 73.006

ELM (s) Training Testing 0.0117 0.0773 0.0680 0.0531 0.6641 0.2070 0.6000 0.0906 8.4320 0.2758 36.614 1.6516 5.0328 0.2359 6.7414 0.3461 18.127 0.7742 0.0072 0.0194 0.0266 0.0273 76.066 1.9700

ELM-OAO (s) Training Testing 0.0133 0.0219 0.0547 0.0313 0.3422 0.2836 0.1453 0.1102 5.1039 1.0383 30.395 4.8422 4.7070 0.4578 4.2766 2.5727 17.923 7.3648 0.0469 0.0422 0.1414 0.2281 1429.5 279.57

tu-logo

ur-logo




References

Summary ELM, ELM-OAO and ELM-OAA obtain similar testing accuracies. ELM-OAO usually requires smaller number of hidden nodes than the single ELM classifier and ELM-OAA. The training time required by ELM-OAO is similar or less than ELM and ELM-OAA when the number of pattern classes is small (say, not larger than 10). However when the number of pattern classes is large (say, tu-logo larger than 10), the training time cost by ELM-OAO is most likely higher than the single ELM classifier but still smaller ur-logo than ELM-OAA. ELM Web Portal: www.ntu.edu.sg/home/egbhuang



References

References G.-B. Huang, et al., “Universal Approximation Using Incremental Networks with Random Hidden Computational Nodes”, IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879-892, 2006. G.-B. Huang, et al., “Extreme Learning Machine: Theory and Applications,” Neurocomputing, vol. 70, pp. 489-501, 2006. G.-B. Huang, et al., “Convex Incremental Extreme Learning Machine,” Neurocomputing, vol. 70, pp. 3056-3062, 2007. M.-B. Li, et al., “Fully complex extreme learning machine,” Neurocomputing, vol. 68, pp. 306-314, 2005. N.-Y. Liang, et al., “A Fast and Accurate On-line Sequential Learning Algorithm for Feedforward Networks,” IEEE Transactions on Neural Networks, vol. 17, no. 6, pp. 1411-1423, 2006. Q.-Y. Zhu, et al., “Evolutionary Extreme Learning Machine”, Pattern Recognition, vol. 38, no. 10, pp. 1759-1763, 2005. G.-B. Huang, et al., “Can Threshold Networks Be Trained Directly?” IEEE Transactions on Circuits and Systems II, vol. 53, no. 3, pp. 187-191, 2006. G.-B. Huang, et al., “Real-Time Learning Capability of Neural Networks”, IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 863-878, 2006.



tu-logo

ur-logo


References




tu-logo

ur-logo


References




tu-logo

ur-logo


References




tu-logo

ur-logo


References




tu-logo

ur-logo


References




tu-logo

ur-logo


References




tu-logo

ur-logo


References




tu-logo

ur-logo


References




tu-logo

ur-logo

Extreme Learning Machine for Multi-Categories ... - CiteSeerX

Extreme Learning Machine for Multi-Categories ... - CiteSeerX

Suggest Documents

Sparse Extreme Learning Machine for Classification - CiteSeerX

Sparse Extreme Learning Machine for Classification - CiteSeerX

Extreme Learning Machine: RBF Network Case - CiteSeerX

Parameter Optimization of Extreme Learning Machine ... - CiteSeerX

Extreme learning machine assessment for estimating sediment ...

Extreme Learning Machine for Graph Signal Processing

Iterative Extreme Learning Machine for Single

Extreme learning machine approach for sensorless ...

Regularized Extreme Learning Machine For Regression ... - Research

Research Article Optimized Extreme Learning Machine for

Extreme learning machine for missing data using

Evolutionary ordinal extreme learning machine

Deep Kernel Extreme-Learning Machine for the

Extreme Learning Machine for Multi-Class Sentiment

Extreme Learning Machine with Randomly Assigned RBF ... - CiteSeerX

Extending Extreme Learning Machine with ...

Ensemble Based Extreme Learning Machine

Utilizing hierarchical extreme learning machine based reinforcement

Online Sequential Extreme Learning Machine ...

Extreme Learning Machine as A Function ...

Extreme Learning Machine - Information Services & Technology

Sample-Based Extreme Learning Machine with ...

Extreme learning machine and its applications Shifei

Extreme Learning Machine and Particle Swarm