Neural networks Statistics, Optimisation, and Learning Examples ...

8 downloads 270 Views 108KB Size Report
Neural networks. Statistics, Optimisation, and Learning. Examples sheet 4. 4.1. The task is to construct a one-dimensional Kohonen network (Haykin, chapter 9)  ...
Neural networks Statistics, Optimisation, and Learning Examples sheet 4

4.1. The task is to construct a one-dimensional Kohonen network to learn the properties of a distribution that is uniform inside an equilateral triangle with sides of unit length, and zero outside. [Hint: to generate this distribution, generate at least 1000 points uniformly distributed over the smallest square that contains the triangle, and then accept only points that fall inside the triangle. These points are the input to your network.] Your network needs to have at least 100 output nodes. There are two learning phases. In the first phase (ordering phase) you should use a time-dependent Gaussian width σ(t) of the neighboring function, as well as a time-dependent learning rate η(t). For the former, one uses: σ(t) = σ0 exp(−

t ). τσ

(1)

Here, σ0 is the parameter that is typically set to correspond to the largest distance in the output lattice. In your one-dimensional network with 100 output nodes, you can use σ0 = 100. Furthermore, the total learning time in the ordering phase Torder should be chosen so that σ(Torder ) ≈ 1. Thus, the parameter τσ is usually chosen to satisfy τσ = Torder /ln[σ0 ]. In your simulations, you can use Torder = 103 , but you are welcome to experiment with other settings. For the learning rate η(t) one uses: η(t) = η0 exp(−

t ). τσ

(2)

The parameter η0 is set by the user. A suggestion is to use η0 = 0.1. After the ordering phase, you should perform fine tunning of the input data (convergence phase). In this phase, the learning rate ηconv , and the Gaussian width σconv of the neighboring function are kept constant, and small. A suggestion is to use σconv = 0.9, and ηconv = 0.01. The total learning time Tconv in the convergence phase is typically long (say, Tconv = 5 · 104 steps). With the settings suggested above, plot the weight vectors corresponding to the 100 output nodes that you obtain after the ordering, and after the convergence phase, together with the desired input triangle. Does the network recognise the triangle. Repeat the simulation with σ0 = 10. How do the results with this value of σ0 differ from the previous ones? Which setting works better? Discuss your results. (2p) 4.2. Learn a two-dimensional Kohonen network with two-dimensional output array with 20 × 20 units to recognise wine classes in the data set on the course home page. Download two files from the course home page: a 1

data set, and a text explaining the variables. The data set is adapted from one of the classification problems on the UCI Machine Learning Repository [1]. The first column in the data set is a classification and should be ignored. [Hint: for each remaining column in the data set, normalise to zero mean and unit variance.] Show how the winning neurons for the different classes are distributed. This can be done by displaying lattice positions of winning neurons for each class (and by coloring different classes differently). Suggested parameter values: σ0 = 30, η0 = 0.1, Torder = 103 , σconv = 0.9, ηconv = 0.01, Tconv = 2 · 104 . Does the network group wines of the same class together? (2p) 4.3. In this question you will use a combination of competitive learning, and supervised learning (back-propagation). For the competitive learning part, use 10 Gaussian nodes to classify the wine data set (without the first column). Unlike in the lecture notes (p. 171), where the winning neuron was obtained by minimising the distance from the input pattern, in this case the winning neuron for pattern x is obtained by maximising the Gaussian activation function, gj (x): h

i

gj (x) = exp −kx − wj k2 /(2s2j ) .

(3)

Here k · · · k denotes the Euclidean distance. Choose the standard deviation sj (radius of attention) of node j as the distance to its nearest neighbour among the other nodes, i.e. sj = mini6=j kwj −wi k. After competitive learning, show the distribution of the winning neurons for the different wine classes. Next, use the outputs, gj (x), obtained after the competitive learning is completed, as inputs to three simple perceptrons. Thus, your network consists of a tendimensional input, and a three-dimensional output. Using back-propagation, train each perceptron to recognise one of the classes in the wine data set. To do this, use tanh activation function, and assign to each of the three wine classes one of the following three-dimensional target outputs (1, −1, −1), or (−1, 1, −1), or (−1, −1, 1). Divide the wine data set (randomly) in two parts: one for training (say, 70% of the data points) and the other for validation (the remaining 30% of the points). Plot the training and the validation energy, as well as the classification error vs. training time. How well do the perceptrons recognise wine classes? Suggested parameter values for competitive learning: σ0 = 10, Torder = 104 , η0 = 0.1, σconv = 0.9, ηconv = 0.02, Tconv = 105 . Suggested parameter values for the back-propagation algorithm: the learning rate η = 0.1, at least 3000 training steps. (2p)

References [1] The UCI Machine Learning Repository, University of California, Irvine, USA, https://archive.ics.uci.edu/ml

2